In natural language processing, let computers understand and process human languages is the main goal. The first problem we meet is how to represent words. As the development of deep learning and neural network technology, their applications in NLP are become more and more popular. The representative work is neural language models, it uses high dimensional vectors to represent words which can be obtained by training the models with large datasets. These vectors, also called word embedding, contain semantic meaning of words and can be applied in many NLP tasks such as machine translation, information parsing and question answering.
Neural language models are trying to predict words according to the given context. Many models are based on the hypothesis that words have similar context will have similar semantic meanings. So they use k(the number of window size) words before and after the target word as context. But if the size is small, some useful word may be outside the window; if the size is large, the context will include many irrelevant words. Levy and Goldberg (2014) used dependency context to overcome the shortages of long distance words context.
There are two type of words in Chinese language: content words and functional words. Content words provide meaningful representations. Functional words serves as an auxiliary component of the sentence, and only have grammatical meaning. We purposed Fix-point Dependency Context method, the basic idea is to take-off functional words, and resulting a fix-point dependency context. So that word embeddings of different models using FDC will have a better performance.
In the evaluation part, we did experiments to compare different word embedding results. Using the same Chinese dataset to train the models that shows the results in word analogical reasoning. Evaluate the model performance by using correlations between the embedding model results and human judgments. Finally, we give the conclusion and describe the future work that can use the new method in different applications, helping to improve the efficiency and effectiveness.
Master thesis under development by Mrs. Yi CHEN. For information, please contact Dr. Tiansi Dong