globalize icon indicating copy to clipboard operation
globalize copied to clipboard

Month suffixes in dateFormatter double in japanese

Open SlexAxton opened this issue 6 years ago • 7 comments

I don't know the exact source, but here's roughly a reduced test case:

// Setup
const Globalize = require('globalize');
Globalize.load( require( "cldr-data" ).entireSupplemental() );
Globalize.load( require( "cldr-data" ).entireMainFor('ja') );
Globalize.locale('ja-JP');

const formatterOptions = {
  skeleton: 'MMMMEEEEc',
};

// Create a japanese formatter.
const f = Globalize.dateFormatter(formatterOptions);

// Format a japanese date
const japaneseDate = f.format(new Date());

// Setup en-US data
Globalize.load( require( "cldr-data" ).entireMainFor('en') );
Globalize.locale('en-US');

// Create an english formatter
const f2 = Globalize.dateFormatter(formatterOptions);
const englishDate = f2.format(newDate());

console.log(japaneseDate);
// 8月月8日(火曜日)

console.log(englishDate);
// Tuesday, August 8

The symbol in Japanese for month is , so in the Japanese output we're outputting the month symbol twice.

Expected: 8月8日(火曜日) Actual: 8月月8日(火曜日)

But you'll note that this is not the case for the english date (probably since there's no concept of a month symbol). I assume any locale with a month symbol would be affected by this.

In the case of Japanese, I can temporarily work around this by using MMMMM instead of MMMM (5 Ms instead of 4 Ms). Five Ms actually means 'shorthand', even though 5 > 4. In the case of Japanese, 'month' is only a single character. So the 'shorthand' is the same as the 'longhand'. So with the skeleton MMMMMEEEEc I actually get the expected value out. This would not work for a language with a month postfix that was not the same in long-hand and shorthand though.

It also means I have to change the skeleton based on the current locale, which defeats the purpose of the generic formatter a littler bit.

Hopefully that's a helpful starting place.

Best, Alex

SlexAxton avatar Aug 08 '17 22:08 SlexAxton

Hi @SlexAxton thanks for filing this issue.

The source of the problem is that {skeleton: MMMMEEEEc} is being wrongly resolved into pattern MMMM月d日(EEEE), but I believe the source of confusion isn't globalize code, but CLDR data, I explain it in details in the CLDR ticket I just filed: http://unicode.org/cldr/trac/ticket/10540

By the way, a quick workaround would be to amend CLDR data like that:

// Quick workaround for now.
Globalize.load({
  "main": {
    "ja": {
      "dates": {
        "calendars": {
          "gregorian": {
            "dateTimeFormats": {
              "availableFormats": {
                "MMMEEEEd": "MMMd日EEEE"
              }
            }
          }
        }
      }
    }
  }
});

Globalize('ja').dateFormatter({skeleton: "MMMMEEEEd"})(new Date());
// > '8月9日水曜日'

Globalize('en').dateFormatter({skeleton: "MMMMEEEEd"})(new Date());
// > 'Wednesday, August 9'

Note you should use MMMMEEEEd skeleton (note the d instead of c).

rxaviers avatar Aug 09 '17 14:08 rxaviers

Why d instead of c if I may ask?

SlexAxton avatar Aug 09 '17 15:08 SlexAxton

Sure, I assume you want:

  • MMMM: long month name
  • EEEE: long weeday name
  • d: the numeric day of the month

rxaviers avatar Aug 09 '17 18:08 rxaviers

Looking at your CLDR ticket, would it make sense to add this missing pattern - "MMMM": "M月" to dateTimeFormats instead of adding complete skeletons?

cahuja avatar Aug 29 '17 01:08 cahuja

Note that this also applies to "MMM" dates and to Chinese:

const ianaTzData = require('iana-tz-data');
const Globalize = require("globalize");
const cldrData = require('./vendor/cldr-data.json');
Globalize.loadTimeZone(ianaTzData);
Globalize.load(cldrData);

function printDateTime(locale, skeleton) {
  const dateFormatter = Globalize(locale).dateFormatter({ skeleton: skeleton })
  console.log(locale + '\t', skeleton + '\t', dateFormatter(new Date()))
}

printDateTime('ja', 'yyyyMMMMdjmm');
printDateTime('ja', 'yyyyMMMdjmm');

printDateTime('zh', 'yyyyMMMMdjmm');
printDateTime('zh', 'yyyyMMMdjmm');

// ja       yyyyMMMMdjmm    2017年9月月7日 13:33
// ja       yyyyMMMdjmm     2017年9月月7日 13:33
// zh       yyyyMMMMdjmm    2017年九月月7日 下午1:33
// zh       yyyyMMMdjmm     2017年9月月7日 下午1:33

mattyork avatar Sep 07 '17 20:09 mattyork

According to comments on the unicode ticket that was opened, the data is fine, but the rules of the spec have not been followed by the implementation. Comment 3 indicates clarification was added to the spec.

dinofx avatar May 07 '19 14:05 dinofx

So what's the status of this one? Is it a cldr issue?

stukalin avatar Dec 24 '21 06:12 stukalin