Thesis Topics in AI/NLP

July 6, 2018
  1. Word-Embedding for Languages with Rich Morphologies word-to-vector has been well researched for the English language. However, if we apply for languages with rich morphologies, e.g. German, Chinese, the quality of embedded vector drops. This master-level thesis will focuse on languages having rich morphologies. Morphological information will be used, and encoded into the neural network to improve the quality of the training result.

    For information, please contact Dr. Tiansi Dong.

  2. Visulisation for Super-dimensional Balls Wordsense embedding is better represented by high-dimensional balls instead of vectors. This introduces a new task of visualisation. Given a large amount of trained high-dimensional balls, this thesis is aimed at developing new tools for visualizing balls. The main task is to reduce dimensions into two or three dimensiona, while keeping original topological relations among balls.

    For information, please contact Dr. Tiansi Dong

  3. Wordsense Disambiguation with Co-occurrence Information and Semantic Relations Co-occurrence Information and Semantic Relations can be seamlessly integrated into spatial relations among high-dimensional balls. This thesis is aimed at using balls to perform the cutting-edge difficult task of wordsense disambiguation, and compare results with benchmarks in the literature.

    For information, please contact Dr. Tiansi Dong

  4. Predicting Causal Relations among Sentences Causal relations among sentences are important for several AI/NLP tasks, e.g. Question-Answering, Information retrieving. Three causal relations are distinguished as entailment, neutral, contradiction. This thesis is targeted at optimizing sentence embeddings by known causal relations, and apply optimized sentence embeddings to predict causal relations of new sentences.

    For information, please contact Dr. Joerg Zimmermann

  5. Chinese Fix-point Dependency Contexts and Word Embeddings In natural language processing, let computers understand and process human languages is the main goal. The first problem we meet is how to represent words. As the development of deep learning and neural network technology, their applications in NLP are become more and more popular. The representative work is neural language models, it uses high dimensional vectors to represent words which can be obtained by training the models with large datasets. These vectors, also called word embedding, contain semantic meaning of words and can be applied in many NLP tasks such as machine translation, information parsing and question answering.

    Neural language models are trying to predict words according to the given context. Many models are based on the hypothesis that words have similar context will have similar semantic meanings. So they use k(the number of window size) words before and after the target word as context. But if the size is small, some useful word may be outside the window; if the size is large, the context will include many irrelevant words. Levy and Goldberg (2014) used dependency context to overcome the shortages of long distance words context.

    There are two type of words in Chinese language: content words and functional words. Content words provide meaningful representations. Functional words serves as an auxiliary component of the sentence, and only have grammatical meaning. We purposed Fix-point Dependency Context method, the basic idea is to take-off functional words, and resulting a fix-point dependency context. So that word embeddings of different models using FDC will have a better performance.

    In the evaluation part, we did experiments to compare different word embedding results. Using the same Chinese dataset to train the models that shows the results in word analogical reasoning. Evaluate the model performance by using correlations between the embedding model results and human judgments. Finally, we give the conclusion and describe the future work that can use the new method in different applications, helping to improve the efficiency and effectiveness.

    Master thesis under development by Mrs. Yi CHEN. For information, please contact Dr. Tiansi Dong

Machine Learning for Text Analysis

July 30, 2018 -- August 10, 2018
In this two-week intensive lab, students will apply machine learning methods for text classification. Under the anaconda environment, master-level students shall practice with python libraries to classify large business reports written in German. They shall start with basic Python programming using Jupyter notebooks, learn to write well-structured Python programms, and visualise data analysis results. The well-known LDA method is the main method used for text classification, advanced representation learning methods are encouraged. Topics of master thesis will be generated during and after this lab.

P3ML Project funded by German BMBF

Nov 1, 2017
under construction Project funded by EU

September 29, 2015 project aims at reducing possible corruption by means of increasing financial transparency, funded by EU for 30 months, cooridinated by Prof. S. Auer. Participants are Fraunhofer IAIS, Open Knowledge Foundation, Fundacion Ciudadana Civio, Transparencz International EU Office, Open Knowledge Foundation Deutschland, University of Economics in Prague, Journalism++, University of Bonn, and Open Knowledge Foundation Greece.

Dagstuhl Seminar 15201

May 4, 2015
Dagstuhl Seminar 15201 on "Cross-Lingual Cross-Media Content Linking: Annotations and Joint Representations" will be held from May 10 to May 13, 2015.

ESEEPS Project funded by NSFC 2015-2018

September 1, 2014
The project "Engineering Seamless Evolution and Environment Perception for Self-adaptive Software Systems" (ESEEPS) is funded by National Natural Science Foundation of China (NSFC) 2015-2018, co-operated with Department of Computer Science and Technology, Nanjing University.

IPEC Winter School 2015 on Speech Technology and Python

July 16, 2014
International Program of Excellence (IPEC) Winter School 2015 will be held in February and March, 2015 at B-IT, with four topics on "Speech Technology and Python". Preparation for the seminar Language, Cognition, and Computation and the lab Speech Technology with Python will start in October, 2014.

For information, please contact Dr. Tiansi Dong.