msbuild icon indicating copy to clipboard operation
msbuild copied to clipboard

Cultures aliased by ICU cannot be used for resource localization on non-Windows environments

Open tarekgh opened this issue 5 years ago • 52 comments

From @CodingDinosaur on October 12, 2018 3:30

When building or running under .NET Core on a Unix-based environment, certain cultures cannot be utilized for resource localization, such as getting localized strings. This impacts both the build process (e.g., identifying and processing resource files) and the lookup of resources at runtime. These cultures can be used as expected when building or running under Windows.

The affected cultures are those which are aliased by ICU -- that is, to save on DB space for certain cases, ICU defines some locales as an "alias" of another. There are 42 locale aliases in ICU 57, of those two of the most common are zh-TW and zh-CN. For a full list of affected locales, see: ICU Aliased Locale List - CultureIssueDemonstration Readme.

This platform-inconsistent behavior when trying to localize certain resources, necessitates using special code and workarounds both at build and deploy time when developing cross-platform applications.

See a demo of this issue in CodingDinosaur/CultureIssueDemonstration

Symptoms

  • Resource files for the affected locales will not be generated into resource assemblies if building in a Unix-based environment.
    • For example: MyStrings.zh-TW.resx will not have a resource assembly created if the build occurs on a Unix-based environment, but will if built under Windows
  • Resources for the affected locales will not be utilized even if the resource files are present, falling back to the default resources.
    • For example: Consider MyStrings.resx and MyStrings.zh-TW.resx. If a custom build step is used to generate a copy the resource assembly for MyStrings.zh-TW.resx to workaround the first issue, requesting a resource utilizing culture zh-TW will still return the resource from the default MyStrings.resx resources.
  • Affected cultures are missing when running under Unix-based environments and calling CultureInfo.GetCultures
  • Some culture data for affected cultures is platform inconsistent, notably the parent locale

Expected behavior

  • Primarily, that whatever the "correct" behaviors for these locales are (as it relates to resources and CultureInfo) they be consistent between Windows and Unix-like environments.
  • Secondarily, that aliased locales would "just work" in .NET Core for resource localization - e.g., that we could have resources defined as zh-TW, just as we can on Windows today, and properly retrieve the expected resources.

Brief Analysis

Most of the above symptoms boil down to uloc_getAvailable in ICU's C API.

For example, the zh-TW resource files do not get copied during build, because during the task SplitResourcesByCulture, the culture is validated against a cache based ultimately on CultureInfo.GetCultures, which in turn, on Unix, ultimately relies on ICU. A diagnostic MSBuild log shows why the file is missing:

Removed Item(s): 
  _MixedResourceWithNoCulture=
    Resources/MyNetCoreProject.MyResources.zh-TW.resx
        OriginalItemSpec=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        TargetPath=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        WithCulture=false

From which we can follow back to the offending path:

Microsoft/msbuild/src/Tasks/Microsoft.Common.CurrentVersion.targets - SplitResourceByCulture -> Microsoft.Build.Tasks.AssignCulture.Execute -> Microsoft.Build.Tasks.Culture.GetItemCultureInfo -> Microsoft.Build.Tasks.CultureInfoCache.IsValidCultureString -> Microsoft.Build.Shared.AssemblyUtilities.GetAllCultures -> CultureInfo.GetCultures -> CultureData.GetCultures -> CultureData(Unix).EnumCultures -> System.Globalization.Native/locale.cpp:GlobalizationNative_GetLocales https://github.com/dotnet/coreclr/blob/8ba838fb54d6c07271d026b2d77bedcb9e2a786a/src/corefx/System.Globalization.Native/locale.cpp#L162-L171

ICU does not return aliases when getting a list of locales -- whether with uloc_getAvailable or Locale::getAvailableLocales (and uloc_countAvailable does not include them in its count).

That ICU does not return the aliases in this manner appears to be intentional, both based on the numerous references to a lack of alias mapping in the uloc documentation, and the following bug:

https://unicode-org.atlassian.net/browse/ICU-4309

uloc_getAvailable returns sr_YU, even though it is an %%ALIAS locale. None of the other %%ALIAS locales are returned.

TracBot made changes - 01/Jul/18 1:59 PM
Resolution Fixed [ 10004 ]
Status Done [ 10002 ] Done [ 10002 ]

ICU-4309 was fixed via: https://github.com/unicode-org/icu/commit/ab68bb319601bc467784dcbdcc6d52131a2863d2 Which seems to further indicate that ICU not returning aliases when calling uloc_getAvailable is intentional.

In-Depth Analysis

A full analysis can be seen in the test repo README: CodingDinosaur/CultureIssueDemonstration

Test Repos

I have two test repos that help demonstrate this issue:

CultureIssueDemonstration

  • https://github.com/CodingDinosaur/CultureIssueDemonstration
  • Demonstrates the symptoms described at both build-time and run-time
  • Running the provided test scripts will allow for running the test code under your current platform, or on Linux via Docker. Thus is is recommended to run under Windows with Docker installed to compare the results under both Windows and Linux.
  • Contains a README file that goes into more detail on the issue and the apparent cause

CultureIcuTest

  • https://github.com/CodingDinosaur/CultureIcuTest
  • Demonstrates the results of using the method utilized in mscorlib to get all locales along with similar methods in ICU directly

Copied from original issue: dotnet/coreclr#20388

tarekgh avatar Oct 31 '18 17:10 tarekgh

Thanks @CodingDinosaur for reporting the issue and listing the details. this is very helpful. we'll take a look.

tarekgh avatar Oct 31 '18 17:10 tarekgh

@cdmihai it is wrong to have msbuild depends on only the list returned from CultureInfo.GetCultures. this was ok before Windows 10 and Linux support but now it is not valid approach.

https://github.com/Microsoft/msbuild/blob/e70a3159d64f9ed6ec3b60253ef863fa883a99b1/src/Shared/AssemblyUtilities.cs#L105

for example, in Windows 10, if you try to create any culture which Windows doesn't have data for, the operation still succeed and the culture can be created as long as the culture name conform to BCP-47 spec. I understand may be msbuild doing that for perf reason which can be kept but will need to add extra case when failing finding any culture in the list, try to call CultureInfo.GetCultureInfo and find out if can create the culture.

We'll try to look how we can enhance the support for aliased culture as this issue suggested but whatever we do here, msbuild will need to do something more. do you want me open a new issue in msbuild to track that?

tarekgh avatar Oct 31 '18 17:10 tarekgh

From @cdmihai on October 15, 2018 18:23

@tarekgh would this be a suitable issue? https://github.com/Microsoft/msbuild/issues/1454

Regarding removing valid locale checks, the biggest issue with this is that it would be a breaking change for MSBuild. The msbuild repo build logic itself has the assumption that non-existing locales are rejected, so a lot of strings are put in Microsoft.shared.resx. If we remove the locale check in the SplitResourceByCulture, then the repo fails building with some invalid locale error (fuzzy memory from ~2 years ago). I have no data on this, but this could also break a lot of other existing repos.

FYI @rainersigwald

tarekgh avatar Oct 31 '18 17:10 tarekgh

Regarding removing valid locale checks, the biggest issue with this is that it would be a breaking change for MSBuild. The msbuild repo build logic itself has the assumption that non-existing locales are rejected, so a lot of strings are put in Microsoft.shared.resx. If we remove the locale check in the SplitResourceByCulture, then the repo fails building with some invalid locale error (fuzzy memory from ~2 years ago). I have no data on this, but this could also break a lot of other existing repos.

I am not sure I understand the breaking scenario here. The scenario you are describing can occur today. for example build some project with some culture introduced in Windows 10 and then run on down-level platform don't have this culture.

tarekgh avatar Oct 31 '18 17:10 tarekgh

From @cdmihai on October 15, 2018 20:40

True, but that's the class of valid locales introduced in different windows versions. The new breaking scenario is for the class of always invalid locales, that were never meant to be locales, and users expect msbuild to not treat them as locales.

tarekgh avatar Oct 31 '18 17:10 tarekgh

The new breaking scenario is for the class of always invalid locales, that were never meant to be locales, and users expect msbuild to not treat them as locales.

I am not sure I agree with that. If the OS/.Net can create a culture for these, then those should be a valid locales to use. Why you think msbuild should reject such cultures?

tarekgh avatar Oct 31 '18 17:10 tarekgh

From @cdmihai on October 15, 2018 23:16

Personally I agree that msbuild should not care and just do what the OS does, but I fear there are actual customers who depend on this behaviour, and changing this might break them. But this is just gut feeling based on the fact that the msbuild repo itself is doing it, and I don't have actual data on it. Alternatively we can only enable it in .net core msbuild, and then customers will have to opt-in to the break by transitioning to .net core. But it's not nice to diverge behaviour.

tarekgh avatar Oct 31 '18 17:10 tarekgh

The only breaking scenario I can think of is when we allow creating resources with a culture not returned by CultureInfo.GetCultures and then move this resources to other machine which cannot understand the used culture. This scenario can happen today anyway. do you have any breaking scenario you can think of?

Microsoft/msbuild#1454 is specific to custom cultures. I would suggest updating it to include the other supported system cultures which is not enumerated by CultureInfo.GetCultures

tarekgh avatar Oct 31 '18 17:10 tarekgh

After looking at this issue, it looks ICU not enumerating the aliased cultures for good reasons. I believe the framework should follow that too and not enumerate such aliased cultures. The framework still can create such aliased culture if anyone want to use them. e.g.:

new CultureInfo("zh-TW"); 

will work fine.

Considering that, I believe the resource issue should be fixed from msbuild side. msbuild should not depend on the enumerated list only not because of aliased cultures only but also for supporting the behavior of Windows 10 which can create any culture as long as the used name is conforming to the BCP-47 specs.

I am going to move this issue to msbuild repo.

@CodingDinosaur thanks again for reporting this issue.

tarekgh avatar Oct 31 '18 17:10 tarekgh

Is there any update to this issue, or at least a workaround?

My understanding is, that it's currently not possible to have an ASP.NET Core application localized with zh- cultures on Linux, which seems like a pretty common use case.

mbp avatar Jul 23 '19 09:07 mbp

Any update?

EdiWang avatar Mar 31 '21 02:03 EdiWang

Team Triage: The design of resource localization is that we infer based on names whether the resource is culture specific. That means we need to be able to take a string and ask if it's a culture. Before the Windows 10 changes to how cultures worked, we just used GetCultures on Windows. The first draft of the Linux support was a hardcoded list of strings. Then we switched over to GetCultures when it's available.

@tarekgh is there a multi-platform check for cultures that doesn't accept dramatically more strings as cultures on windows? Based on @Forgind 's testing it looks like if we just call new CultureInfo() we would incorrectly classify existing user resources as locale specific.

/cc: @wli3 for loc

benvillalobos avatar May 05 '21 15:05 benvillalobos

@BenVillalobos the PR https://github.com/dotnet/msbuild/pull/6003 had the detailed discussion. Please let me know if there is any question not answered there, I can help answering here. My comment https://github.com/dotnet/msbuild/pull/6003#issuecomment-763909780 is answering your posted question, I guess.

tarekgh avatar May 05 '21 16:05 tarekgh

Any updates? Because this issue, I have to build my project on GitHub Action Windows runner, but on Windows runner, I can't build Linux image due to this issue: Can not run Linux docker on Windows runner

So I can't build my asp.net core project to Linux image, this is unacceptable.

garyzhuosim avatar Nov 12 '21 06:11 garyzhuosim

CC @rainersigwald

tarekgh avatar Nov 12 '21 17:11 tarekgh

@BenVillalobos, you've been looking at this, right?

rainersigwald avatar Nov 12 '21 18:11 rainersigwald

Yeah, this issue is blocked on upgrading our projects from netstandard2.0. Though the PR is out of date, I've been using a local branch and making decent progress. Once that's covered, I'll jump on this.

benvillalobos avatar Nov 12 '21 18:11 benvillalobos

Any updates? There have different behaviors between msbuild and dotnet cli on windows, it bothers me a lot.

jiaming0708 avatar Nov 17 '21 05:11 jiaming0708

Any updates? There have different behaviors between msbuild and dotnet cli on windows, it bothers me a lot.

me too.

wyc730 avatar Nov 17 '21 05:11 wyc730

Problem also exists on Windows 10 + .NET 6 SDK with zh-CN & zh-TW

Repo to reproduce: https://github.com/BornToBeRoot/NETworkManager

BornToBeRoot avatar Nov 30 '21 01:11 BornToBeRoot

@BornToBeRoot this is already discussed before in this thread. Please have a look at the comment https://github.com/dotnet/msbuild/issues/3897#issuecomment-434787301

tarekgh avatar Nov 30 '21 01:11 tarekgh

Any progress?

chaoyebugao avatar Jan 20 '22 10:01 chaoyebugao

@chaoyebugao Yes! This is currently blocked by https://github.com/dotnet/msbuild/pull/6148, which is being worked on.

benvillalobos avatar Jan 20 '22 18:01 benvillalobos

ku is alias to ckb but could be ku is not enumerated when getting the whole list of cultures (through CultureInfo.GetCultureInfo). If you try to create culture object for ku this still work. The ICU ticket talking about similar issue here.

msbuild depends on the list returned from CultureInfo.GetCultureInfo. So, if you use ku for tagging a resource, this may fail during the build time. This is tracked by the issue.

tarekgh avatar Feb 09 '22 23:02 tarekgh

Status update: https://github.com/dotnet/msbuild/pull/6148 is awaiting review/merge 🤞

benvillalobos avatar Feb 24 '22 16:02 benvillalobos

What's the current status of this? We have an ASP.NET Core application with zh-CN and zh-TW resource which we need to start deploying on Linux. So far we've identified that CultureInfo.GetCultures(CultureTypes.AllCultures) was not including either of those cultures when running on Linux.

  • As it stands, can we expect zh-CN/zh-TW resources to be compiled and loaded at runtime under Linux if the culture is set to CultureInfo.GetCultureInfo("zh-TW")?
  • Can we expect CultureInfo.GetCultures(CultureTypes.AllCultures) start returning these cultures on Linux at some point or not?
  • Does it matter whether we build on Windows or Linux or only where we run? Should it matter?

madelson avatar Jul 05 '22 14:07 madelson

As it stands, can we expect zh-CN/zh-TW resources to be compiled and loaded at runtime under Linux if the culture is set to CultureInfo.GetCultureInfo("zh-TW")?

This issue is open to fix this issue. When this issue is resolved, you'll be able to do so at that time.

Can we expect CultureInfo.GetCultures(CultureTypes.AllCultures) start returning these cultures on Linux at some point or not?

No, these names are not standard names. The standard names are zh-Hans-CN and zh-Hant-TW. You still can create the old name cultures as CultureInfo.GetCultureInfo("zh-TW").

Does it matter whether we build on Windows or Linux or only where we run? Should it matter?

It shouldn't matter.

In general, I recommend you create the resources with the cultures zh-Hans and zh-Hant instead. This will be allowed to do so from now, (I mean you don't have to wait for this issue to get resolved). And the resources will work on Windows and Linux. In addition, these resources will work just fine even if you set the UI culture to zh-CN or zh-TW

tarekgh avatar Jul 05 '22 16:07 tarekgh

Thanks for the context @tarekgh !

This issue is open to fix this issue. When this issue is resolved, you'll be able to do so at that time.

Great to hear. Is there a milestone or timeline for this? Is the issue suitable for a community contribution to accelerate?

In general, I recommend you create the resources with the cultures zh-Hans and zh-Hant instead

We'll probably be forced to do this as a workaround, but it will be disruptive to other systems we have in place for managing the translation process since we're talking about a huge number of files across hundreds of applications.

madelson avatar Jul 18 '22 20:07 madelson

Is there a milestone or timeline for this? Is the issue suitable for a community contribution to accelerate?

@BenVillalobos could you please advise regarding that?

We'll probably be forced to do this as a workaround, but it will be disruptive to other systems we have in place for managing the translation process since we're talking about a huge number of files across hundreds of applications.

I am not seeing this as a workaround, but it is the right thing to do for localizing Chinese resources. Doing that will make your localization work with other Chinese cultures for free (e.g. zh-HK for instance). I understand this will be disruptive and I can imagine the work needs to be done here, but I recommend it considering the benefits we will get from it.

tarekgh avatar Jul 18 '22 20:07 tarekgh

Is there a milestone or timeline for this? Is the issue suitable for a community contribution to accelerate?

We hear you, There's no current timeline for it, but in the coming days, we plan to spend some time designing a solution we feel comfortable opening up to a community contribution.

benvillalobos avatar Jul 18 '22 23:07 benvillalobos