validator icon indicating copy to clipboard operation
validator copied to clipboard

bcp47_language_tag doesn't fail on some non-BCP47 tags

Open bfabio opened this issue 1 year ago • 5 comments
trafficstars

  • [x] I have looked at the documentation here first?
  • [x] I have looked at the examples provided that may showcase my question here?

Package version eg. v9, v10:

v10

Issue, Question or Enhancement:

When using bcp47_language_tag for validation, some non-BCP47 tags such as "eng" or "en_US" are passing as valid.

‎isBCP47LanguageTag() uses golang.org/x/text/language's Parse and its documentation says:

[snip] It accepts tags in the BCP 47 format and extensions to this standard defined in https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.

Code sample, to showcase or reproduce:

I expect both of these to fail, but they don't:

package main

import (
	"fmt"
	"github.com/go-playground/validator/v10"
)

func main() {
	validate := validator.New()

	err := validate.Var("en_US", "bcp47_language_tag")
	if err != nil {
		fmt.Println(err.Error())
		return
	}

	err = validate.Var("eng", "bcp47_language_tag")
	if err != nil {
		fmt.Println(err.Error())
		return
	}
}

bfabio avatar Feb 08 '24 20:02 bfabio

I think golang.org/x/text/language's Parse is based on Unicode Locale Data Markup Language (LDML)'s Unicode Language and Locale Identifiers which is based on BCP47 (but they are not strictly the same). E.g., Unicode Language and Locale Identifiers allow the underscore _ to be used as a separator.

sep = [-_] ;

But not BCP47:

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

There is a section called BCP 47 Conformance which reads:

It allows certain syntax for backwards compatibility (not BCP 47-compatible):

  • The "_" character for field separator characters, as well as the "-" used in

shihanng avatar Mar 06 '24 12:03 shihanng