Kudos AI | Blog | Machine Translation: State-of-the-art

Machine Translation: State-of-the-art

Artificial intelligence is more than ever capable of addressing one of the great challenges of NLP: machine translation.

The goal of this technique is to be able to translate fluently and accurately from one spoken, or written, language to another without human intervention.

While the concepts underlying machine translation are easy to understand, the processes involved in automating the adaptation from one language to another are extraordinarily complex.

Human languages have a wide variety of nuances and jargon. They even contain words with multiple meanings.

An algorithm must be able to process and understand all of these subtleties in order to provide an accurate translation.

To achieve this, various technologies such as Deep Learning, Big Data and speech analytics need to be brought together.

Some of these, such as Cloud computing, Cloud storage, and Web APIs, have been around for a while. They are all proving useful for running a machine translation engine.

- Recent developments in machine translation -

There are currently three main types of machine translation: rule-based machine translation (RBMT), statistical model-based machine translation, and hybrid systems that combine both approaches.

RBMT consists of a set of rules that determine how a text or voice recording should be translated. Typically, this model is based on the combination of two dictionaries to produce a translation.

In the last two decades, most implementations of AI-assisted translation have been based on RBMT, as this technique was the first to produce relevant results. It is ideal for technical documents. It provides literal translations that conform to common standards.

However, dictionaries have their limitations, as there are many words that are difficult to translate from one language to another.

To improve the efficiency of AI, a new method has been developed: statistical machine translation (SMT). Instead of using dictionaries, these algorithms learn to translate by examining bilingual texts.

With the development of artificial intelligence, SMT is no longer the dominant model. More and more applications are using statistical or hybrid machine translation.

Both techniques are based on real documents rather than extracting information from preloaded dictionaries. The hybrid system combines both approaches.

Vocabulary databases are used for the initial translation, while training on bilingual texts provides the nuances necessary for good human language understanding.

Given that machine learning-based translation relies on Big Data, it is not surprising that major cloud service providers are pioneers in this field.

Amazon, Google, Microsoft, Facebook and others have developed innovative technologies. They rely on the countless conversations between users of their platforms in a wide range of languages.

Google Translate is widely used, but is generally not accurate enough. Facebook has adopted an unsupervised learning model that has proven to be particularly effective. AWS has also introduced a machine translation service.

The cloud giants are not alone in this market. There are at least 45 machine translation companies worldwide.

Some providers focus on creating translation services for professional and technical documents. Other companies turn to humans to increase the accuracy when quality leaves something to be desired.

Machine translation services for businesses and professionals are becoming increasingly popular. These fully automated services can save companies a lot of money and time when adapting content.

- The limitations of machine translation -

Most algorithms are not capable of understanding the nuances of everyday language, let alone the technical language used in medical or legal documents, for example. Incorrect translation in this type of activity can lead to serious problems.

Similarly, machine translation tools do not handle literary adaptations very well and have difficulty understanding works of fiction.

An algorithm may, for example, not be trained to recognize humor or sarcasm and will, therefore, have difficulty transcribing these terms. Jargon and culture-specific dialects can also cause problems for a machine learning model.

Sometimes there is not even a way to translate a slang because it depends on a cultural context.

Finally, we can agree that translation is a difficult problem, but we cannot deny that machines are getting better at it either.