Sugar icon indicating copy to clipboard operation
Sugar copied to clipboard

Here's how to fix SugarJS parsing of dates with Unicode space characters in them

Open jikamens opened this issue 10 months ago • 0 comments

Andrew Plummer appears to have gone radio silent on SugarJS for several years now. It's kind of weird since he's active in other repositories and people are clearly still using SugarJS. I hope everything's OK.

In any case, I just had to deal with a SugarJS date-parsing issue that I suspect an increasing number of people are going to run into over time, so even if nothing is ever going to be done by Andrew about this issue, I thought I should post it here to tell others how to fix it.

If you've arrived here because machine-generated dates you're trying to parse with SugarJS are suddenly failing to parse, and when you look at the dates they look perfectly fine, it may be because (you may have already figured this next part out) some of the spaces in the dates are actually Unicode short space characters (a.k.a. 0x202F, \u202F, 202F), which SugarJS doesn't understand. This is the Unicode standards for generating human-readable dates have changed, and various JavaScript platforms are switching to the new standards over time. See, e.g., https://github.com/nodejs/node/issues/45938 and https://github.com/nodejs/node/issues/45171.

I dug into the innards of the SugarJS date-parsing code, and this is the change you need to make SugarJS understand these dates:

--- a/lib/date.js
+++ b/lib/date.js
@@ -2269,7 +2269,7 @@ function getNewLocale(def) {
       function formatToSrc(str) {
 
         // Make spaces optional
-        str = str.replace(/ /g, ' ?');
+        str = str.replace(/ /g, '\\s*');
 
         str = str.replace(/\{([^,]+?)\}/g, function(match, token) {
           var tokens = token.split('|');

Note that this change actually does two things: makes the code accept all whitespace characters, not just ASCII 32, and makes the code treat multiple adjacent whitespace characters as one. I think this behavior is correct since you never know when people are going to put extra spaces in things, but if you just want the first half of that, then you can do this:

--- a/lib/date.js
+++ b/lib/date.js
@@ -2269,7 +2269,7 @@ function getNewLocale(def) {
       function formatToSrc(str) {
 
         // Make spaces optional
-        str = str.replace(/ /g, ' ?');
+        str = str.replace(/ /g, '\\s?');
 
         str = str.replace(/\{([^,]+?)\}/g, function(match, token) {
           var tokens = token.split('|');

If want to keep the only-match-one-character before and you want to be paranoid and only match the specific Unicode character we're talking about, then you can do:

--- a/lib/date.js
+++ b/lib/date.js
@@ -2269,7 +2269,7 @@ function getNewLocale(def) {
       function formatToSrc(str) {
 
         // Make spaces optional
-        str = str.replace(/ /g, ' ?');
+        str = str.replace(/ /g, '[ \u202F]?');
 
         str = str.replace(/\{([^,]+?)\}/g, function(match, token) {
           var tokens = token.split('|');

But I think the first change above is probably reasonable and safe.

jikamens avatar Sep 05 '23 17:09 jikamens