Language Identification

or, "Using IETF Language Subtags to identify every language, dialect, and variant in the entire world."

Languages in the unfoldingWord digital publishing system are identified using Internet Engineering Task Force (IETF) language tags. IETF tags provide an abbreviated language code that uses modern computing standards and is backward compatible with ISO 639 language codes but provides a standardized means of identifying additional information, including language variants and scripts.

In the IETF standard, macro languages are identified using two-letter codes (from ISO 639-1) while all other languages use the three-letter "Ethnologue code" (ISO 639-3) where this code exists. The language tags are comprised of subtags separated by hyphens. The IETF standard also provides a flexible means of adding new language variants, through the use of "-x" to indicate a private use tag (not in the official registry).

These are examples of language tags:

  • hi: Hindi language
  • aaa: Ghotuo language
  • en-AU: English language, as written and spoken in Australia
  • az-Latn-IR: Azeri language, written in the Latin script, as used in Iran
  • ttt-x-ismai: Tat language, Ismaili variant (for private use only)

IETF language tags are used in many protocols, including HTTP (the browser can indicate the user's language preference to the server, the server can indicate to the browser the language and script in which the content is served) and XML (through the xml:lang attribute).

More information:

Software libraries: