CCMT 2019讲习班暨中国中文信息学会《前沿技术讲习班》(ATT)第18期



09:00-11:00 Tutorial 1 Speech Translation


We will start with an overview on the different use cases and difficulties of speech translation. Due to the wide range of possible applications these systems differ in data, difficulty of the language and spontaneous effects. Furthermore, the interaction with human has an important influence. In the main part of the tutorial, we will review state-of-the-art methods to build speech translation system. We will start with reviewing the translation approach of spoken language translation, a cascade of an automatic speech recognition system and a machine translation system. We will highlight the challenges when combining both systems. Especially, techniques to adapt the system to scenario will be reviewed. With the success of neural models in both areas, we see a rising research interest in end-to-end speech translation. While we see promising results on this approach, international evaluation campaigns like the Shared Task of the International Workshop on Spoken Language Translation (IWSLT) have shown that currently often cascaded systems still achieve a better translation performance. We will highlight the main challenges of end-to-end speech translation. In the final part of the tutorial, we will review techniques that address key challenges of speech translation, e.g. Latency, spontaneous effects, sentence segmentation and stream decoding.

Jan Niehues博士简介:

Jan Niehues is an assistant professor at Maastricht University. He received his doctoral degree from Karlsruhe Institute of Technology in 2014 on the topic of “Domain Adaptation in Machine Translation”. He has conducted research at Carnegie Mellon University and LIMSI/CNRS, Paris. His research has covered different aspects of Machine Translation and Spoken Language Translation. He has been involved in several international projects on spoken language translation e.g. the German-French Project Quaero, the H2020 EU project QT21 EU-Bridge and Elitr. Currently, he is one of the main organizer of the spoken language track in the IWSLT shared tsk.

14:00-16:00 Tutorial 2 Domain Adaptation for Neural Machine Translation


Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this tutorial, we give a comprehensive review of the state-of-the-art domain adaptation techniques for NMT. We hope that this tutorial will be both a starting point and a source of new ideas for researchers and engineers who are interested in domain adaptation for NMT.


Chenhui Chu received his B.S. in Software Engineering from Chongqing University in 2008, and M.S., and Ph.D. in Informatics from Kyoto University in 2012 and 2015, respectively. He is currently a research assistant professor at Osaka University. His research won the MSRA collaborative research 2019 grant award, 2018 AAMT Nagao award, and CICLing 2014 best student paper award. He is on the editorial board of the Journal of Natural Language Processing, Journal of Information Processing, and a steering committee member of Young Researcher Association for NLP Studies. His research interests center on natural language processing, particularly machine translation and language and vision understandng.


Rui Wang is a tenure-track researcher at NICT. His research focuses on machine translation, a classic task in NLP (or even in AI). He (as the first or the corresponding authors) has published more than 20 MT papers in top-tier NLP conferences and journals, such as ACL, EMNLP, COLING, AAAI, IJCAI, TASLP, TALLIP, etc. He has also won the first places on several language pairs of WMT shared tasks, such as the unsupervised Czech<->German task in 2019 and the supervised Finnish/Estonian<->English tasks in 2018. He served as the area co-chairs of CCL-2018/2019 and the organization co-chairs of PACLIC-29 and YCCL-2012.