Internationalization: strings with variables
The topic of internationalization (abbreviated I18N because there are 18 letters between 'I' and 'N') is rather extensive and complex. It is the process of preparing content, especially source code, to be translated for other locations or locales. The process of actually making content suitable for other locations is aptly named localization (L10N for short). The part of the process most people are already familiar is translation, where the text is changed from one natural language, e.g., English, to another, e.g., Spanish. Localization is more than just translation because it includes cultural differences between countries (idioms, taboos, etc), numeric, date and monetary formats, and even the way colors and graphical symbols are used. In the United States, red means stop, yellow means caution and green means go. These meanings are from the colors used in traffic signals. In other countries, these colors may or may not have the same connotation.
If you want to make just one small change in how you write source code, regardless of the programming language, there is one thing that will pay big dividends when it is time to internationalize and localize your code. In fact, this technique makes I18N part of the code construction and doesn't leave it for later when it is more expensive and risky to change existing, working code.
When coding, frequently you need to form a string with values from one or more variables. Take the follow sentence, for example, where the file name is from a variable.
The file, Example.txt, is missing.
The classic (and bad) way to create the sentence is code is:
"The file, " + strFileName + ", is missing."
This, however, might literally need to be translated as:
Is missing, the Example.txt file.
Because of the way the sentence is concatenated, the translation could not be made satisfactorily. The translator has two pieces of the sentence, "The file, " and ", is missing.".
Avoid Concatenating Sentences
Use a string Format or printf function when supported by the programming language.
"The file, %s, is missing." (C)
"The file, {0}, is missing." (C#, VB.NET)
This way, the translator has the entire sentence as one piece to translate.
For JavaScript, include the following code that implements a simple 'format' function.
// adapted from http://community.hdri.net/blogs/ray_blog/archive/2006/02/27/5.aspx
String.format = function()
{
// e.g., String.format("hello {0}", "world")
if (0 == arguments.length)
{
return "";
}
var str = arguments[0];
for (var i = 1; i < arguments.length; i++)
{
var re = new RegExp("\\{" + (i-1) + "\\}(?!\})","gm");
str = str.replace(re, arguments[i]);
}
return str;
}
Restructure the Sentence
Alternately, if a format function is not available, restructure the sentence so it is in one string and the variable follows.
Missing file: Example.txt
"Missing file: " + strFileName
The key is to keep the sentence or phrase as a unit and not concatenate words to make a sentence.
The wording is perhaps not as elegant, but it is understandable. More is to be gained by having informative messages, like the one below, than by having polished wording.
The file is missing. Check that the file exists and that the name is correctly typed.
File: Example.txt
More than one variable
Multiple variables are supported better by .NET than C because the order of the variables can be changed by the translator.
"'{0}' cannot have the value '{1}'."
might be literally translated as
"Value '{1}' is prohibited in '{0}'."
Even if the language does not support reordering, it is better than concatenation.
Numbers and Plurality
Numbers add another level of complexity because of plurality. Consider these sentences for example.
There are no people in the room.
There is one person in the room.
There are two people in the room.
There are 11 people in the room.
A simple format string might be:
"There are {0} people in the room."
But this, of course, does not handle the singular case very well.
There are 1 people in the room.
This format string seeks to handle plurality, but is awkward and unruly.
"There is/are {0} person/people in the room."
There is/are 1 person/people in the room.
Code can decide between multiple format strings based on checking the value of the number. For example,
Value is 1 use "There is one person in the room."
Otherwise use "There are {0} people in the room."
The results are probably acceptable in most languages.
There are 0 people in the room.
There is one person in the room.
There are 2 people in the room.
There are 3 people in the room.
In English, count is limited to singular (1) and plural (not 1). Other languages have more choices, for example, dual (2 as in a 'pair'). Some, such as Polish, are even more complex. Polish differs depending on whether the number ends in 2, 3, or 4. Even in English, the concept exists. For instance, we have seen how using the words "no", "one", "two", etc. are preferred to "0", "1", and "2" for small numbers. Another example is the "st", "nd", "rd" and "th" ending, as in:
1st, 2nd, 3rd, 4th, 5th, ..., 12th, ..., 22nd, and so on.
The rule to determine the ending is much more complex than just singular and plural. I know of no good way to handle this case without writing code that examines the count and selects the string accordingly.
case n = 0:
case n = 1:
case n = 2:
case n = 3:
case n mod 10 = 1 and n > 19:
case n mod 10 = 2 and n > 19:
case n mod 10 = 3 and n > 19:
otherwise:
You will need to know the requirements for the locations your content will be viewed and make the best compromises of cost and quality of translation.
Instead of complex coding and multiple strings to translate, simply structure the sentence to avoid plurality.
"Number of people in the room: " + nNumPeople
Number of people in the room: 1
Conclusion
Use the Format function to avoid concatenating strings to form sentences or at least structure the sentence so that the variable follows the complete sentence.