dart-sass icon indicating copy to clipboard operation
dart-sass copied to clipboard

Add ASCII output support

Open rzhw opened this issue 6 years ago • 53 comments

I've been working on a product using libsass/sassc and have migrated it to Dart Sass. Presently, this product doesn't support UTF-8 characters in stylesheets.

It looks like Dart Sass only supports outputting as UTF-8, with dart-lang/sdk#11744 being the blocker given in the README for why there's no support for more encodings (UTF-16, etc). Dart does however appear to have an AsciiEncoder.

For the time being, we've added an extra step to CSS escape non-ASCII characters in generated stylesheets. (On a related note, we also remove the @charset 'UTF-8'; atrule, both because it would be technically incorrect for an ASCII-encoded stylesheet, and because of #567.)

This isn't trivial because of sourcemaps, so we're doing this step with a PostCSS plugin.

Would adding ASCII-encoded output support be in scope of Dart Sass? I'd imagine when Dart adds other encoders, having this ready would let other encodings be sibling output options alongside UTF-8 and ASCII.

rzhw avatar Jan 11 '19 06:01 rzhw

This is something I could see adding as a command-line flag (--ascii-only or something like that) to serialize Unicode characters as ASCII escapes.

nex3 avatar Jan 17 '19 23:01 nex3

@nex3 will this command-line flag be ready when Ruby sass is deprecated?

bit-wise avatar Feb 18 '19 20:02 bit-wise

Ruby Sass has been deprecated for almost a year now. And no, there's no plan to tie this issue to its release cycle. It's marked as "help wanted", which means it's not a priority for the Sass team, but if an external user wanted to contribute a fix we'd help them land it.

nex3 avatar Feb 18 '19 22:02 nex3

@Awjin When to solve this problem?

jpuncle avatar Nov 18 '19 11:11 jpuncle

// @source - [@Stephn-R](https://github.com/sass/sass/issues/1395#issuecomment-57483844)
// @description converts 1 or more characters into a unicode
// @markup {scss}
// unicode("e655"); // "\e655"
@function unicode($str){
    @return unquote("\"")+unquote(str-insert($str, "\\", 1))+unquote("\"")
}

BuptStEve avatar Dec 05 '19 11:12 BuptStEve

So, I'm trying to get a zero-width unicode character to work with SASS. It won't appear without a hex editor because of SASS reinterpreting that. On normal CSS, it's:

div::before {
  content: "\200B";
}

But SASS will rewrite it as:

div::before {
  content: "​";
}

It's a little frustrating trying to debug with an invisible character since SASS wants to rewrite it. A flag would unfortunately be global to everything when it's somewhat of an edge case where you want raw/literal characters to be outputted as a string. I just have these few instances where it's better to not convert to Unicode. I can imagine there's a lot of other characters, both printable and non-printable, that would greatly benefit from not being rewritten as Unicode, such as:

  • https://en.wikipedia.org/wiki/Non-breaking_space
  • https://en.wikipedia.org/wiki/Zero-width_space
  • https://en.wikipedia.org/wiki/Word_joiner
  • https://en.wikipedia.org/wiki/Zero-width_joiner

clshortfuse avatar Jul 28 '20 17:07 clshortfuse

Finding this thread today because of an issue similar to @clshortfuse above.

I learn that Dart Sass is converting my authored:

content: '\00A0/\00A0';

to:

content: ' / ';

…where the spaces written do seem to be no breaking spaces, but they are prone to rendering in the browser as:

A

No such problem with Libsass, but then Libsass fails at some other stuff.

watershed avatar Jan 29 '21 12:01 watershed

Here’s another example that erratically fails due to to the following authored Sass:

content: '\0231F';
transform: rotate(45deg);

…ending up as:

content:"⌟";transform:rotate(45deg);

what-you-can-do-generated-content-fail

See also this Twitter thread.

watershed avatar Jan 29 '21 13:01 watershed

@watershed As mentioned earlier in this thread, Sass emits a @charset declaration or a BOM whenever it emits non-ASCII output, which will force browsers to interpret the stylesheet as UTF-8 even if it's served with non-UTF-8 headers. If that's not working, chances are you're doing some sort of post-processing that's incorrectly stripping that extra information.

nex3 avatar Jan 29 '21 23:01 nex3

@nex3 -- I have not touched any settings of Angular CLI that (to my knowledge) would affect the inclusion or omission of @charset. In fact, I haven't configured any of the compilation process (it's using defaults). However, I'm encountering this issue.

cbush06 avatar Mar 01 '21 02:03 cbush06

@cbush06 Are you seeing a case where your CSS is being served with @charset (or a UTF-8 BOM) and is still being interpreted as the wrong character set? Otherwise, I'm not sure how Sass can address your issue.

nex3 avatar Mar 04 '21 00:03 nex3

@nex3 -- After reading your posts earlier, I went and checked. What I discovered is the CSS generated for each angular component do have the @charset. Other SCSS files (e.g. from my assets folder) do not have it included.

cbush06 avatar Mar 04 '21 00:03 cbush06

Disclaimer: I would have posted it in https://github.com/sass/dart-sass/issues/1219 because I don't feel that this two issues are the same even if they are marked as duplicate but that issue was closed. While I agree that converting from unicode to escape is necessary to output ascii files and that an explicit flag is need in this scenario I also think that the opposite is not always true.

Hi, I'm currently investigating the move of a project from node-sass to dart-sass and the automatic conversion of unicode escapes leaves me a little confused because I don't get the reason to track and convert them instead of treating them as raw strings if it is semantically the same to the interpreter? Is it there some automatic conversion that is harder to explicitly disable or requires a lot of effort? This troubles me with FontAwesome and other "content" properties that are more explicitly identified as icons when shown as escapes instead of japanese or non-printable characters.

To clarify: I would expect this kind of behaviour as a mean of minification and not applied while producing "expanded" styles as output but only when "compressed" mode is enabled without needing an extra flag.

In any case this is only my opinion. Thanks.

IlBaffo avatar Jun 15 '21 17:06 IlBaffo

Sass's internal representation of a string, like just about every programming language's, is just a sequence of characters (in Sass's case, a "character" means a "Unicode code point"). Whether those characters were written as escapes or not, they're all converted into the character in question internally—if you write "\24" that's exactly the same as writing "$". Both of these return a string whose contents is a single character, U+0024 DOLLAR SIGN. This is the same process that happens in JavaScript when you write "\x24" or "$"`.

This means that when we go to serialize a string to CSS, all we know are the Unicode code points that are the contents of the string. We need to determine how to serialize those without any information about where they came from, and so we serialize them as Unicode rather than escapes so that people writing non-English languages have legible CSS files.

nex3 avatar Jun 15 '21 20:06 nex3

I'm just curious why that conversion happens at all during the parsing phase and not during the serialization phase keeping it as-is; could the unescaped string be a keyword or another expression like \x24myvariable or \x33 * \x33 ? Wouldn't source maps have a different offset in column number from the physical file?

Again, I'm asking just out of curiosity because I feel that this is a deliberate choice (like keeping down complexity?) rather than a language limitation, I don't mean to criticize.

Again, thanks for the work.

IlBaffo avatar Jun 15 '21 22:06 IlBaffo

I'm just curious why that conversion happens at all during the parsing phase and not during the serialization phase keeping it as-is; could the unescaped string be a keyword or another expression like \x24myvariable or \x33 * \x33 ?

All the string functions in Sass need to operate on the strings' actual text; if we lazily parsed escape codes, it would make all the functions much less efficient and much more complex. Imagine trying to implement str.slice() when you have to adjust all the indexes to account for escape sequences that might exist. It gets even worse when you start thinking about how strings interact with custom functions; we'd basically have to eagerly resolve escapes as soon as a host language is dealing with a string, which means that no custom functions would ever preserve escapes.

Wouldn't source maps have a different offset in column number from the physical file?

No, source maps are totally orthogonal. They're tracked on a statement-by-statement basis, not value-by-value.

nex3 avatar Jun 16 '21 00:06 nex3

Ok, I understood that the problem arises when the string get manipulated and there is no telling if or when that will happen during the parsing phase. Anyway how did libsass achieve that? I don't know enough C to navigate their codebase but there must be some specific pattern implemented there, even chrome shows both representation in the css inspector.

As a wild speculation (and just for fun) could cpu usage in this case could be traded for memory by storing both representations in the StringExpression as different fields (only when there is an escaped char)? Thus allowing an eventual "length" or "indexOf" to read from the unescaped string with zero performance loss and proxying the common string methods to reflect the changes in the (non always present) raw string.

IlBaffo avatar Jun 16 '21 07:06 IlBaffo

Ok, I understood that the problem arises when the string get manipulated and there is no telling if or when that will happen during the parsing phase. Anyway how did libsass achieve that? I don't know enough C to navigate their codebase but there must be some specific pattern implemented there, even chrome shows both representation in the css inspector.

They didn't. In older versions of LibSass, string functions were simply broken—they returned the wrong results for strings with escape sequences. Newer versions work the same as Dart Sass (with the exception that they'll avoid parsing certain property values entirely, causing some escape sequences to be retained if they're written directly in a property value even though they'd be resolved if they were stored in a variable).

As a wild speculation (and just for fun) could cpu usage in this case could be traded for memory by storing both representations in the StringExpression as different fields (only when there is an escaped char)? Thus allowing an eventual "length" or "indexOf" to read from the unescaped string with zero performance loss and proxying the common string methods to reflect the changes in the (non always present) raw string.

This would only improve the performance of a few functions—functions like str.slice() and operations like string concatenation would still be very expensive and complicated and also more memory intensive.

nex3 avatar Jun 16 '21 23:06 nex3

As this is a breaking change from node-sass, shouldn't this difference at least be represented in the documentation?

Novynn avatar Jun 21 '21 04:06 Novynn

Node Sass's current behavior is the same as Dart Sass's (except again in edge cases involving unparsed properties). Even in those cases, it's not a breaking change: a literal non-ASCII character has exactly the same behavior as that character's escape code.

nex3 avatar Jun 22 '21 20:06 nex3

@nex3 Apologies for bringing this up again. I've read many threads regarding this issue, but I still have a problem and I don't know if this is due to sass or something else.

Despite sass inserting a @charset declaration at the top (or almost top -- see below) I still have issues with icon fonts, i.e. Fontawesome, intermittently being rendered with the wrong font, usually Times New Roman.

Like I say, this happens intermittently, but it never happened when the generated CSS retained non-ASCII character escape codes.

As for the @charset declaration being almost at the top of the output file -- this happens when I @import Google fonts: the @charset is placed after the @import, which is hoisted to the top of the CSS. I have no idea if this has anything to do with the issue.

Any advice or suggestions you can provide would be greatly appeciated.

beard7 avatar Aug 18 '21 13:08 beard7

@beard7 it sounds like the root issue there is whatever software is hoisting your @imports above your @charsets. That's not a safe transformation to make, and can be expected to break the browser's ability to correctly determine the encoding of your document.

It's worth noting that in Dart Sass 1.38.0, we released a change where characters from Unicode Private Use Areas are now emitted as escapes in expanded mode. That should also mitigate the pain here without breaking the ability to have legible non-English.

nex3 avatar Aug 18 '21 19:08 nex3

FWIW - there is a large issue thread from font-awesome which indicates that while having @charset "utf-8"; and <meta charset="UTF-8"> improve the situation, they do not correct it 100% of the time. It still happens intermittently.

  • https://github.com/FortAwesome/Font-Awesome/issues/17644
  • https://github.com/FortAwesome/Font-Awesome/issues/18775

The only solution found has been been something very similar to what @BuptStEve mentioned earlier in this issue (I think simply because it causes dart-sass to ignore the strings).

https://github.com/FortAwesome/Font-Awesome/issues/18775#issuecomment-1073217558

@function fa-content($fa-var) {
  @return unquote("\"") + unquote(str-insert($fa-var, "\\", 1)) + unquote("\"");
}

We just recently switched our scss code over to dart-sass and started experiencing the intermittent issues with font display right after deploying. Switching to this approach "fixes" it for us as well.

jpcamara avatar Jun 14 '22 12:06 jpcamara

@jpcamara Can you provide a reproduction case where a browser doesn't respect @charset "utf-8"? I'll need the specific browser version and a stylesheet with a @charset declaration or BOM that includes two non-ASCII characters, one written in raw UTF-8 and one written in an escape sequence, so I can verify that they render differently.

nex3 avatar Jun 21 '22 20:06 nex3

I am still facing this issue even after modifying the fa-content.

image

When I inspected the source of the bundled css file, this is how it appears ^^.

Only in production mode bundle in webpack this happens.

Requesting your help pls @jpcamara ?

Versions: "sass": "1.54.3", "sass-loader": "13.0.2",

Font-awesome 5x

cc @logeshpaul

jerryephicacy avatar Aug 08 '22 09:08 jerryephicacy

@jerryephicacy Is that rendering incorrectly in a browser, or is it just in your text editor? Because it's entirely possible that your text editor is just not loading the file as UTF-8.

nex3 avatar Aug 08 '22 20:08 nex3

@nex3 ... randomly it fails in the browser.

But I have made it work now.

Just have to modify the fa-content function and remove the slash ( \ ) symbol from the icon variables.

jerryephicacy avatar Aug 09 '22 19:08 jerryephicacy

To reiterate the above, if you can provide a reproduction where this fails in the browser even with a @charset or UTF-8 BOM, we will reconsider our default output.

nex3 avatar Aug 09 '22 22:08 nex3

I think the assumption that all CSS output by dart-sass will be loaded directly by a browser is not a given. For example, I was running into this problem because my compiled CSS files are loaded by GWT, which doesn't know about the @charset annotation. Thus, you get this while compiling and it swallows the css block.

[WARN] Line 13 column 12: encountered """. Was expecting one of: "}" "+" "-" "," ";" "/" <STRING> <IDENT> <NUMBER> <URL> <PERCENTAGE> <PT> <MM> <CM> <PC> <IN> <PX> <EMS> <EXS> <DEG> <RAD> <GRAD> <MS> <SECOND> <HZ> <KHZ> <DIMEN> <HASH> <IMPORTANT_SYM> <UNICODERANGE> <FUNCTION>

I think the newer versions of GWT that use GSS might not have this problem, but I can't move to that easily. Regardless of my specific situation, my point is that Sass output is used in many kinds of toolchains that aren't the browser.

nmoresco avatar Aug 10 '22 23:08 nmoresco

Sass targets the CSS specification. We'll make exceptions for browser behavior that's contrary to the spec only because browsers are the overwhelming majority of CSS consumers. Any other tool should follow the specification when consuming CSS, and if it doesn't it's pretty clearly a bug in that tool and not in Sass.

nex3 avatar Aug 11 '22 01:08 nex3