When working with HTML5, correctly declaring the language of your content is crucial for accessibility, search engine optimization, and overall user experience. For Chinese-language content, the choice often comes down to using zh-CN or zh-Hans. This article breaks down the differences, explains the standards, and provides clear recommendations.
Understanding Language Tags in HTML
The lang attribute in HTML specifies the language of the element's content. It helps browsers, search engines, and assistive technologies like screen readers to process text correctly. The value for this attribute is defined using language tags from the BCP 47 standard (which incorporates RFC 5646).
A language tag can consist of several subtags:
- Language Subtag: The primary language code (e.g.,
zhfor Chinese). - Extended Language Subtag (extlang): Specifies dialects (e.g.,
yuefor Cantonese). - Script Subtag: Defines the writing script (e.g.,
Hansfor Simplified Chinese,Hantfor Traditional Chinese). - Region Subtag: Indicates the country or region (e.g.,
CNfor Mainland China,TWfor Taiwan).
Decoding the Chinese Language Subtags
Let's examine the specific subtags relevant to marking Chinese content.
Primary Language: zh
The foundation of any Chinese language tag is zh, which is the ISO 639-1 code for the Chinese macrolanguage. A macrolanguage encompasses several related languages that are often considered dialects in a broader cultural context. Using zh is the standard and widely recognized way to declare that your content is in Chinese.
Script Subtags: Hans and Hant
The script subtag is critical for distinguishing between the two primary writing systems used for Chinese.
Hans: Denotes Simplified Chinese characters, used predominantly in Mainland China, Malaysia, and Singapore.Hant: Denotes Traditional Chinese characters, used predominantly in Taiwan, Hong Kong, and Macau.
This subtag is written with the first letter capitalized.
Region Subtags: CN, TW, HK, etc.
Region subtags specify a geographical variant. Common region codes for Chinese include:
CN: Mainland ChinaTW: TaiwanHK: Hong KongMO: MacauSG: Singapore
Extended Language Subtags: Dialects
BCP 47 also includes codes for various Chinese dialects, such as yue (Yue Chinese/Cantonese) and wuu (Wu Chinese/Shanghainese). These can be used as extended language subtags. However, their usage is highly specific and often unnecessary for general web content, as they pertain more to spoken language than written text.
zh-CN vs. zh-Hans: What's the Difference?
This is the core of the confusion. The difference is one of specificity and purpose.
zh-CN: This tag means "Chinese as used in Mainland China." It implies both a region (CN) and, by strong convention, the use of the Simplified Chinese script (Hans). However, technically, the script is not explicitly defined by this tag.zh-Hans: This tag means "Chinese written in the Simplified Chinese script." It is explicitly clear about the writing system without being tied to a specific geographical region. This is the modern, precise way to denote Simplified Chinese content that could be intended for users in Mainland China, Singapore, or a global audience.
The key distinction is that zh-Hans defines the script, while zh-CN defines the region. For most web content, the script is the most important piece of information for rendering and processing text.
Best Practices and Recommendations
Following international standards ensures maximum compatibility and clarity.
1. Prioritize Script Over Region
The World Wide Web Consortium (W3C) recommends keeping language tags as short as possible while conveying the necessary information. For specifying Simplified Chinese, the preferred tag is zh-Hans. It is precise, unambiguous, and not limited by geopolitical boundaries. It tells browsers and assistive tools exactly how the text is written.
2. Keep It Simple
Avoid overcomplicating the tag. For the vast majority of websites, zh or zh-Hans is perfectly sufficient. The extended language subtags for dialects are rarely needed unless your content is specifically aimed at a spoken dialect, which is uncommon for written web pages.
3. Ensure Broad Compatibility
While zh-Hans is the standard, zh-CN remains widely supported due to its long history of use. However, the web ecosystem has largely adopted the more precise BCP 47 standard. Using zh-Hans is the forward-looking and semantically correct choice. You can confidently use it without worrying about compatibility issues with modern browsers and systems.
For a deeper understanding of how these standards apply to modern web tools and platforms, you can explore more about web development standards here.
Frequently Asked Questions
Q: Can I use just zh without any subtags?
A: Yes, using zh is valid and means "Chinese language." However, it doesn't specify the script. If your page contains a mix of Simplified and Traditional characters, zh is appropriate. Otherwise, specifying the script (zh-Hans or zh-Hant) is more precise and better for accessibility.
Q: When should I use a region subtag like CN or TW?
A: Use a region subtag only when the content is specific to that region beyond just the written script. This could include localized idioms, formatting for dates/currency, or other cultural references. For most general content, the script subtag is sufficient.
Q: What is the correct tag for Traditional Chinese?
A: The recommended tag for Traditional Chinese is zh-Hant. You can also use region-specific tags like zh-TW (Taiwan) or zh-HK (Hong Kong) if the content is specifically tailored for those regions.
Q: My website serves both Simplified and Traditional Chinese. How should I handle the lang attribute?
A: The lang attribute should be set on each element that contains a different language. You can set the overall page language to zh and then use the lang attribute on specific sections to override with zh-Hans or zh-Hant as needed.
Q: Are there any SEO implications for choosing one tag over another?
A: Yes, correctly declaring your language helps search engines understand which audience to serve your content to. Using the precise tag like zh-Hans helps search engines accurately index your Simplified Chinese content for the correct users.
Q: What about other Chinese dialects like Cantonese?
A: For written content, Cantonese is almost always written using standard Chinese characters (either Simplified or Traditional). The dialect subtag (e.g., yue) is primarily relevant for audio content or linguistic specificity and is not commonly needed for general websites.