chainlit icon indicating copy to clipboard operation
chainlit copied to clipboard

regex for language codes does not accept es-419

Open DanielVelaJ opened this issue 1 year ago • 4 comments
trafficstars


Describe the bug

When using Chainlit, setting Google Chrome's language to "español (Latinoamérica)" causes the application to fail with a 422 Unprocessable Entity error. The issue arises because Chainlit's language validation pattern does not accept the language code es-419, which corresponds to "español (Latinoamérica)". As a result, the application is unable to load translations and settings, preventing it from functioning properly.

To Reproduce

Steps to reproduce the behavior:

  1. Set Google Chrome Language to "español (Latinoamérica)":

    • Open Google Chrome.
    • Click on the three dots in the upper-right corner and select Configuración (Settings).
    • Scroll down and click on Configuración avanzada (Advanced) to expand advanced settings.
    • Under Idiomas (Languages), click on Idioma (Language).
    • Click Añadir idiomas (Add languages) and select "Español (Latinoamérica)".
    • Click on the three dots next to "Español (Latinoamérica)" and select Mover al principio (Move to the top) to make it the default language.
    • Restart Chrome to apply the changes.
  2. Run a Chainlit Application:

    • Open a terminal and run chainlit hello to start a basic Chainlit application.
  3. Open the Application:

    • In Google Chrome, navigate to http://localhost:8000.
  4. Observe the Error:

    • The application fails to load properly.
    • Open Chrome's developer console (press F12 or right-click and select Inspeccionar (Inspect), then go to the Console tab).
    • Notice multiple 422 Unprocessable Entity errors related to requests to /project/translations and /project/settings with the query parameter language=es-419.

Expected behavior

Chainlit should accept the es-419 language code corresponding to "español (Latinoamérica)" and load the appropriate translations if available. If translations for es-419 are not available, the application should gracefully fall back to a default language (e.g., es for general Spanish or en for English) without causing errors. The application should load normally and be fully functional regardless of the browser's language settings.

Screenshots image

Browser Console Error

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Google Chrome
  • Version: Versión 128.0.6613.138 (Build oficial) (64 bits)

Smartphone (please complete the following information):

Not applicable.

Additional context

  • Error Details:

    The server returns the following error message:

    {
        "detail": [
            {
                "type": "string_pattern_mismatch",
                "loc": [
                    "query",
                    "language"
                ],
                "msg": "String should match pattern '^[a-zA-Z]{2,3}(-[a-zA-Z]{2,3})?(-[a-zA-Z]{2,8})?(-x-[a-zA-Z0-9]{1,8})?$'",
                "input": "es-419",
                "ctx": {
                    "pattern": "^[a-zA-Z]{2,3}(-[a-zA-Z]{2,3})?(-[a-zA-Z]{2,8})?(-x-[a-zA-Z0-9]{1,8})?$"
                }
            }
        ]
    }
    
  • Cause of the Issue:

    The error occurs because Chainlit's validation regex for the language query parameter does not accept numeric region codes like 419. The regex pattern only allows alphabetic characters in the region and variant parts, so es-419 (which corresponds to "español (Latinoamérica)") is rejected.

  • Impact:

    Users with Google Chrome set to "español (Latinoamérica)" cannot load Chainlit applications properly, affecting accessibility for Spanish-speaking users in Latin America and the Caribbean.

  • Workaround:

    Changing Chrome's language setting to general Spanish (es) or Spanish (Spain) (es-ES) allows the application to load correctly. However, this is not an ideal solution for end-users who prefer "español (Latinoamérica)".

  • Suggested Fix:

    • Modify the Validation Regex:

      Update the regex pattern in Chainlit's code to accept numeric region codes. For example:

      ^[a-zA-Z]{2,3}
      (-[a-zA-Z0-9]{2,3})?
      (-[a-zA-Z0-9]{2,8})?
      (-x-[a-zA-Z0-9]{1,8})?$
      

      This change allows numeric values in the region and variant parts, accommodating language codes like es-419.

    • Graceful Fallback:

      Implement logic to default to a base language (e.g., es) if a specific regional variant is not supported. If es-419 translations are not available, Chainlit should use es.json or en.json without causing errors.

  • References:

    • Chainlit Documentation:

      The documentation mentions that translation files are named after the language code and that the language is dynamically set based on the browser's language. However, it does not specify limitations regarding numeric region codes.

    • IETF Language Tags:

      According to the IETF BCP 47 standard, language tags like es-419 are valid and commonly used to represent regional variations.

  • Additional Notes:

    • Reproducing the Issue: The issue was observed exclusively on Google Chrome with the language set to "español (Latinoamérica)". Other browsers were not tested.

    • Translation Files: Attempting to add an es-419.json translation file did not resolve the issue due to the validation pattern rejecting the es-419 code.


DanielVelaJ avatar Sep 17 '24 17:09 DanielVelaJ