The W3C provides this very long guide on choosing language tags/subtags.
The important bits:
Language tag syntax is defined by the IETF's BCP 47. In the past it was necessary to consult lists of codes in various ISO standards to find the right subtags, but now you only need to look in the IANA Language Subtag Registry. We will describe the new registry below.
This article provides advice on how to choose the components of a language tag. For an overview of the concepts defined in BCP 47, seeLanguage tags in HTML and XML.
...
There are tools available which provide additional help while searching the registry, such as Richard Ishida's Language Subtag Lookup tool.
...
Ensure you have the right language. Sometimes, it pays to check a few alternatives. Mark Davis, co-author of BCP47, writes "Often it is not clear which language identifier to use. For example, what most people call Punjabi in Pakistan actually has the code 'lah', and formal name'Lahnda'. There are many other cases where the same name is used for different languages, or where the name that people search for is not listed in the IANA registry."
You could look up language information in the SIL Ethnologue and cross-reference that information with Wikipedia. The Ethnologue uses the same three-letter codes as BCP47, but you'll need to convert BCP47 2-letter codes to their ISO 639-3 counterpart to look up a language by code. (Richard Ishida's tool does this for you.)
There are a small number of cases where different language codes are available for what many people would regard as the same language, eg. Filipino and Tagalog, or Twi and Akan. There is no indication in the registry as to which you should use, but you should try to ensure that within a single application or context you are consistent.
(Emphasis mine.)
It should be noted that IANA language subtag registry is kinda hard to use. With the exception of grandfathered-in tags (like en-GB-oed
), you have to look up the language family tag and the region/variant subtags separately. And the tags/subtags are organized by type rather than hierarchy. So just save yourself the time and trouble and use Richard Ishida's awesome lookup tool.