Tehran Institute for Advanced Studies (TeIAS)

/ Making Sense of Limited Resource in Cross-Lingual NLP


Making Sense of Limited Resources in Cross-Lingual NLP


We have witnessed a surge of accurate natural language processing systems for many languages. Such systems rely heavily on annotated datasets. In the absence of such datasets, we should be able to make sense of what available datasets we have at hand. These include gold-standard annotations from other languages or incidental supervisions that are available in online resources such as Wikipedia. In the first part of the talk, I present different methods for the transfer of syntactic and semantic dependency parsers. We propose a method that is a combination of annotation projection and direct model transfer that can leverage a minimal amount of information from a small out-of-domain parallel dataset to develop highly accurate transfer models. We present an unsupervised syntactic reordering model to improve the accuracy of dependency parser transfer for nonEuropean languages. Moreover, we improve semantic dependency parsing by leveraging multi-task learning with supervised syntactic information in the target language of interest. In the second part of the talk, we propose a simple but very effective unsupervised technique to leverage the Wikipedia data for creating a highly accurate machine translation model in which in some cases, our model performance surpasses that of a supervised model. In the end, we will talk about our current work on leveraging weakly supervised translation models for cross-lingual transfer as well as cross-lingual image captioning.

June 7, 2021
(17 Khordad, 1400)



This Talk is online

Registration Deadline

June 6, 2021

You may need a VPN to start the talk.