How Data Translation Builds Better AI

Quality training data to support their multilingual AI.

Good AI doesnā€™t just happen (though it can seem like that when your smart home assistant jumps into a conversation). In order to train natural language processing algorithms behind the AI we use daily, tech companies need lots of quality training data to support theirĀ multilingual AI.

At Argos, we work on data collection translation projects for tech companies around the world. These projects typically involve either text translation, voice translation and recording, or image translation and annotation. The datasets we work within this space are very different from typical translation source material: theyā€™re larger and require specific translation rules, as theyā€™ll be fed back into the clientā€™s machine learning system.

To help you choose a translation vendor who can handle data translation projects of this type, weā€™ve collected some best practices below for using translation to build better AI.

Translate the data resources you have

If youā€™re a tech company working in multilingual AI, translation can be useful for a range of data types.

Text datasets can be used to trainĀ natural language processing (NLP) AIs for machine translatorsĀ that support social media and translation apps. Your voice data assets can be used to improve the functionality of an AI system so it can recognize a variety of accents, genders, speech types and more. (This is just as important for an English-only system as it is for a multilingual one.) And images may be used to teach self-driving cars, tracking systems, hover-over translations apps, and more. High-quality translation is key to producing a system that can recognize words, image text, or commands in multiple languages.

Dataset translation is especially helpful if you want to create something in a less common language. Letā€™s consider a simple case and imagine that youā€™re building a basic chatbot that can recognize several different languages, including Dutch, and then respond accordingly in the userā€™s language. You probably already have a bunch of samples in English of things that you expect someone to write in the query box, but will have a much smaller pool of datasets in Dutch. This is where you can take your English data set and translate it to Dutch, doubling your data volume.

[form_newsletter]

If youā€™re working on more complex or specific projects, however, you may need to also specify how your datasets are translated. For example, if youā€™re a bank building a chatbot AI, youā€™ll have a range of commands to translate. ā€œWhat is my account balance?ā€ or ā€œOpen a new accountā€ can be directly translated. But specific industry terms, like U.S. 401(k) plans, donā€™t always have universal equivalents. In those cases, youā€™ll need to ask the translator to find or suggest equivalent terms or flag a term as untranslatable, to alert you where any native data might need to be collected.

Set style guides early

When starting a translation project for your multilingual AI tech, itā€™s a good idea to choose a translation partner who will work with you to set up style and rule guides that will produce the type of data you want. Doing this at the beginning of the process will result in cleaner datasets that can smoothly feed back into your AI.

A quality data translation vendor will establish this style guide with you as you begin to define the scope of the project. The optimal translation process for your data will look different depending on your AI training model, so itā€™s a good idea to carefully talk the process through with your translation partner to make sure their processes will help you get the data and data quality you need.

Embrace the human element (human-in-the-loop)

For any kind of data translation project, the best process is one that keeps humans in the loop. Supervised data translation andĀ quality assuranceā€”rather than crowdsourcingā€”is the best way to get reliable and clean multilingual data sets that are suitable for test and training situations.

Check out ourĀ resources for planning large-scale translation projects orĀ contact usĀ to get started translating your data.

Add Your ing

WANT TO LEARN MORE

Connect with our leaders andĀ AI experts.
ā€ØDiscover how we can partner today.

SOCIAL MEDIA & CONTACTS

X

Skip to content