Write for machine translation

Guidelines to help you write text for machine translation

Use short sentences

Limit your sentences to a maximum of 25 words. Longer sentences are too difficult for the machines, and are more likely to be ambiguous. The developers of Sun Proof found that this rule was the most important one.

Spell check your document

If a spell checker cannot recognise a word, then a translator will leave the word untranslated. Also, the spell checker will often pick up mistakes that don't matter to a human, like leaving out the space between sentences. The spell checker will not correct mistakes like using 'to' when you mean 'too' or 'two', so you will have to find those mistakes yourself.

Avoid metaphors and jokes

Metaphors and jokes often don't make sense after translation.

Keep pronouns to a minimum

Pronouns are words like it, him and those ones. Pronouns are used instead of nouns that appeared earlier in a sentence or in a previous sentence, as a shorthand. But different languages use different word orders, so your meaning can be lost. Also some languages use different genders for different objects, unlike English. A machine (translating from English to French) will translate 'it' as 'il'. 'Il' could mean 'he' or 'it' to a French reader, so it may be unclear what you are referring to.

Spell things out instead of using abbreviations or initials

Machine translators will not understand abbreviations. The initials of organisations, such as the United Nations, are often different in different languages.

Keep your adjectives and adverbs near the words they refer to

In complex sentences, adjectives can become separated from their nouns and adverbs separated from their verbs. Keep sentences simple, and this won't happen.

Use correct grammar and punctuation

Use simple grammatical structures. Consult a simple grammar guide, like BBC Skillswise Grammar or The Guide to Grammar and Writing, to make sure that you are writing standard clear English. In the last sentence, I wrote 'make sure that you are writing', not 'make sure you are writing'. The first version is clearer and easier to translate.

Avoid idioms, slang and jargon

The reasons for this are obvious. Such words are either impossible to translate, or may be translated wrongly.

Avoid ambiguous words

This is a difficult rule to follow. We don't notice which words have more than one meaning, because we pick the right meaning for the context. Machine translators don't understand the context, so they may pick the wrong meaning to translate. The word 'right', for instance, can mean 'the opposite of left', or 'correct', or 'privilege', among other meanings. 'Harder' can either mean 'more difficult', or mean 'less soft'. Use a word with a single meaning, such as 'correct', instead of 'right', where you can.

Avoid compound verbs

These are verbs like "set off", "head up", "give over" and "bring out". Compound verbs are usually mistranslated.

Use the International Standard date format

Dates can cause several problems. Year, month, and day are written in different orders in different countries. Month names like 'May' and 'March' are also English words. Fortunately there is a simple solution to all of these problems: the International Standard date format. This is all numerical, so is not translated. The format is year, month, day, written as: YYYY-MM-DD. 2004-03-09 is the date that might be written as 9 March 2004, or March 9 04, or 03/09/04 or 9/3/4. The standard format is always the same length and never ambiguous. Computers sort these standard dates into the correct order, unlike other date formats.

Use a machine translator to translate your text and then translate it back again

If you followed the rules, and your text survives this test with its meaning intact, it is likely that the translation makes sense too. Here is an example using a page from this site. But remember there are some problems that this test will not show up. If a word cannot be translated, then it will appear in the re-translation. That looks OK to you, but means nothing to the person reading the first translation. If you use a metaphor, like 'the heart of the problem', then a word like 'heart' will be translated literally both ways, and so looks OK to you. But it may not mean anything in the first translation, just as if you had written 'the liver of the problem'.