Nowadays, we spend a great part of our time online using digital technologies either for personal or professional reasons. We might not be aware, but this means that we greatly benefit from the progress achieved in machine translation and language technologies.
Multilingualism and language diversity are at the heart of the European project. For this reason, the European Union is active in promoting multilingualism and supporting the deployment of technologies to break down language barriers.
The Automated Translation building block, funded under Connecting Europe Facility (CEF) Telecom, helps European and national public administrations exchange information across Europe. It provides machine translation capabilities (eTranslation) that will enable all EU Digital Service Infrastructures (DSIs) to become multilingual.
Discover the service
Unlike general-purpose web translators, eTranslation guarantees confidentiality and security of all translated data. It is adapted to specific terminology and text types frequently used throughout Europe in different contexts (e.g. tender documents, legal texts, medical terminology). eTranslation builds on the machine translation service of the European Commission, in order to create a platform that is more flexible and scalable, and that offers custom solutions to different online services.
Language technologies go far beyond just machine translation. Among others, they also offer applications for:
- text analysis (such as named-entity recognition)
- data anonymisation
- automatic text summarisation
Furthermore, language technologies can be developed and customised for any specific scenario where human language is processed.
Meet the projects
Two successful eTranslation projects are:
- The Multilingual Anonymisation toolkit for Public Administrations is a project aimed at introducing Natural Language Processing (NLP) tools and at developing a toolkit for anonymisation of texts in the medical and legal fields. The project covers all EU languages, including those that are normally under-resourced, such as Latvian, Lithuanian, Estonian, Slovenian and Croatian. The multilingual anonymisation toolkit is based on the Named-Entity Recognition (NER) technique, so it is able to recognise and anonymise common names and surnames in all EU countries. The toolkit provides support to public administrations and enables them to comply with the GDPR requirements by performing data anonymisation and de-identification, in particular in the health and legal domains. This demo provides more information on the project and its outcomes.
- The OCCAM project (OCR, ClassificAtion & Machine Translation) focuses on the integration ofimage classification, Translation Memories (TMs), Optical Character Recognition (OCR), and Machine Translation (MT) to support the Automated Translation of scanned documents, a document type that could not be previously processed by the CEF eTranslation service. Scanned forms typically contain template text, for which translations are available in Translation Memories. The project aims at recognising such text from scanned images and translating it automatically. By filling this gap, OCCAM directly contributes to lowering the language barrier. More information on the project can be found in this demo.
Connecting Europe Facility (CEF) Telecom
The Connecting Europe Facility (CEF) in Telecom is a key EU instrument to facilitate cross-border interaction between public administrations, businesses and citizens, by deploying digital service infrastructures (DSIs) and broadband networks.
Supported projects contribute to the creation of a European ecosystem of interoperable and interconnected digital services that sustain the Digital Single Market.
- Publication date
- 1 February 2022
- European Health and Digital Executive Agency | Directorate-General for Health and Food Safety
- Programme Sector
- Connecting Europe Facility 1
- Digital technologyDigital transformationEUFunded