tools 📎 Formatter: Number literal separators

Description

Consistently format separators in number literals example

Test Cases

js/literal-numeric-separator/test.js
js/literal/number.js

Sep 30 '22 11:09 MichaReiser

@xunilrj would this be a formatter improvement you're interested in tackling? I know you're good at high performance string manipulation :)

Oct 11 '22 08:10 MichaReiser

This issue is stale because it has been open 14 days with no activity.

Oct 25 '22 12:10 github-actions[bot]

Hi! I wanted to get to know the Rome codebase a bit so I thought I'd look at a formatter issue and then a lint rule, or something like that. I've been trying this one out and made some progress so I think I'll be able to submit a PR for it soon.

From what I understand (from the comments on #2442), we can't use regex like Prettier does (not sure why, binary size maybe? Or because running multiple regex replaces in sequence is too inefficient?), so I'm hand-writing the string manipulation that Prettier does with the regexes, and because you mentioned high performance I'm trying to make sure it

only clones if the number literal actually needs changing,
doesn't do multiple passes through the number literal, and
doesn't do insertions or deletions in the middle of a string or anything like that.

Feels a little bit like I'm optimizing prematurely given that number literals are rarely long, but it's been kind of fun so whatever :sweat_smile:

Let me know if I need to know anything else :)

Nov 01 '22 19:11 jeysal

Feels a little bit like I'm optimizing prematurely given that number literals are rarely long, but it's been kind of fun so whatever 😅

That's awesome that you're enjoying it.

We aren't using RegEx because it is a rather large crate and we have so far been able to implement similar replacement logic without.

You can find a similar "normalisation" in

https://github.com/rome/tools/blob/5e45610dc563a64808fdbf127334226a3141e271/crates/rome_js_formatter/src/utils/string_utils.rs#L336-L361

The main idea is to return a Cow::Borrowed if the original string doesn't change and only allocate a new string if the string differs.

An alternative solution would be to implement an iterator that returns the "cleaned" text and then simply compare the text one by one. Here a rather complicated solution for handling JSX:

use std::borrow::Cow;
use std::iter::Peekable;
use std::str::Chars;


pub fn clean_jsx_text(input: &str) -> Cow<str> {
    let mut cleaned = CleanJsxWhitespace::new(input);
    let mut actual = input.chars();

    let mut pos = 0;

    loop {
        let next_cleaned = cleaned.next();
        let next_actual = actual.next();

        match (next_cleaned, next_actual) {
            (Some(cleaned_char), Some(actual_char)) => {
                if cleaned_char == actual_char {
                    // all good
                    pos += cleaned_char.len_utf8();
                } else {
                    // difference, copy what we have so far, then take everything from cleaned
                    let mut result = String::from(&input[..pos]);

                    result.push(cleaned_char);

                    for cleaned_char in cleaned {
                        result.push(cleaned_char);
                    }

                    return Cow::Owned(result);
                }
            }
            (None, Some(_)) => {
                // anything following in actual now can only be truncated away text
                return Cow::Borrowed(&input[..pos]);
            }
            (Some(_), None) => {
                // This should never happen because cleaned only removes
                unreachable!()
            }
            (None, None) => return Cow::Borrowed(input),
        }
    }
}

#[derive(Debug)]
struct CleanJsxWhitespace<'a> {
    inner: Peekable<Chars<'a>>,
    /// Leading/trailing has the same definition as in the Formatter/Parser
    /// trailing: Any whitespace following a character up to, but not including a new line
    /// leading: Any whitespace before the next token, including all line breaks before the last token.
    /// Flag set to true as soon as the iterator starts processing trailing trivia (
    is_trailing_whitespace: bool,

    /// Tracks if there has been any trailing whitespace
    has_trailing_whitespace: bool,

    /// Tracks if there has been a new line
    has_leading_newline: bool,

    /// Tracks if this is the first line
    first_line: bool,
}

impl<'a> CleanJsxWhitespace<'a> {
    pub fn new(input: &'a str) -> Self {
        Self {
            inner: input.chars().peekable(),
            has_trailing_whitespace: false,
            has_leading_newline: false,
            is_trailing_whitespace: false,
            first_line: true,
        }
    }
}

// * First Line:
//      * Renders single space even if the text is all empty
//      * Render single space before first character.

impl Iterator for CleanJsxWhitespace<'_> {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            if let Some(next) = self.inner.peek() {
                match next {
                    ' ' | '\t' => {
                        // Eat over space
                        self.inner.next();

                        if self.first_line {
                            match self.inner.peek() {
                                // `<>    </>` -> `<> </>`
                                None => {
                                    return Some(' ');
                                }
                                // Collapse succeeding whitespace
                                Some(' ' | '\t' | '\n' | '\r') => {}
                                // `<>  a<>` -> `<> a</>`
                                Some(_) => {
                                    return Some(' ');
                                }
                            }
                        }

                        // Track all trailing trivia
                        if self.is_trailing_whitespace {
                            self.has_trailing_whitespace = true;
                        }
                    }

                    '\n' | '\r' => {
                        // Only track the newline if it has been after a non-whitespace character.
                        // (e.g. don't track newlines if the text so far has been empty)
                        if self.is_trailing_whitespace {
                            self.has_leading_newline = true;
                        }

                        self.has_trailing_whitespace = false;
                        self.is_trailing_whitespace = false;
                        self.first_line = false;

                        // Skip new line character
                        self.inner.next();
                    }
                    _ => {
                        // In case there's any whitespace or newline between two characters, print one space
                        // `<>a   a</>` -> `<>a a</>`
                        if self.has_trailing_whitespace || self.has_leading_newline {
                            self.has_trailing_whitespace = false;
                            self.has_leading_newline = false;

                            return Some(' ');
                        }

                        self.is_trailing_whitespace = true;

                        return self.inner.next();
                    }
                }
            } else {
                // `<>a\n b </>` -> `<>a b </>`
                if self.has_trailing_whitespace {
                    self.has_trailing_whitespace = false;
                    return Some(' ');
                }

                return None;
            }
        }
    }
}

#[cfg(test)]
mod tests {
    use crate::clean_jsx_text;

    #[test]
    fn clean_jsx_text_works() {
        assert_eq!("", clean_jsx_text(""));
        assert_eq!(" ", clean_jsx_text(" "));
        assert_eq!("Foo", clean_jsx_text("Foo"));
        assert_eq!(" Foo", clean_jsx_text(" Foo"));
        assert_eq!("Foo", clean_jsx_text("\nFoo"));
        assert_eq!(" Foo", clean_jsx_text("\tFoo"));
        assert_eq!("Foo", clean_jsx_text("\n \t Foo"));
        assert_eq!("Foo", clean_jsx_text("\n \t \n \t\nFoo"));
        assert_eq!(" Foo bar lorem", clean_jsx_text(" Foo bar lorem"));
        assert_eq!("Foo ", clean_jsx_text("Foo "));
        assert_eq!("Foo", clean_jsx_text("Foo\n"));
        assert_eq!("Foo ", clean_jsx_text("Foo\t"));
        assert_eq!("Foo", clean_jsx_text("Foo\t \n "));
        assert_eq!("Foo", clean_jsx_text("Foo\n \t \n \t\n"));
        assert_eq!("Foo Bar", clean_jsx_text("Foo\n \t\t\n \tBar"));
        assert_eq!(
            "Foo Bar",
            clean_jsx_text("\n \t\t\n \tFoo\n \t\t\n \tBar\n \t\t\n \t")
        );
    }
}

and there are probably other ways to do this. Looking forward to see what you come up with!

Nov 02 '22 08:11 MichaReiser

@MichaReiser thanks for the tips! The solution I've started yesterday is roughly the iterator solution but I didn't impl Iterator, just wrote a

pub struct FormatNumberLiteral<'token> {
    token: &'token JsSyntaxToken,
    text: Cow<'token, str>,
}
impl Format<JsFormatContext> for FormatNumberLiteral<'_> {
    fn fmt(&self, f: &mut JsFormatter) -> FormatResult<()> {
}
impl<'token> FormatNumberLiteral<'token> {
    pub fn from_number_literal_token(token: &'token JsSyntaxToken) -> Self {
}

The loop is getting pretty complicated since there are quite a few cases. I'll set aside some more time later today and try to finish it.

Nov 02 '22 11:11 jeysal

tools tools copied to clipboard

📎 Formatter: Number literal separators

Description

Test Cases

tools
tools copied to clipboard