smithy Function to format smithy files

Hey folks,

My team's putting some work into having a polished editor experience :

https://github.com/disneystreaming/smithy-language-server
https://github.com/disneystreaming/vscode-smithy

We were wondering whether there was any existing utility function we could piggyback on to format smithy files. We are aware of the serialiser that outputs smithy syntax, but it doesn't quite cut it because the comments get lost and it requires a semantically valid model. Ideally we'd like to operate the formatter at a syntactic level.

Any idea ?

Jan 12 '22 17:01 Baccata

We don't currently have anything in smithy-model to do standalone formatting. You're right that the serialization in smithy-model loses comments and needs a semantically valid model. Syntactic comments have no formal relationship to shapes, and would only show up in things like parse trees, which we don't use.

Where would formatting happen? In VS code or in the LSP? Or maybe in something like Prettier? I'm wondering where this would be best implemented.

Jan 13 '22 21:01 mtdowling

The LSP has an endpoint to apply formatting. Additionally, if we had access to a tokenizer, we could have much more accurate syntactic colouring, provided at the LSP level.

Jan 14 '22 07:01 Baccata

Implementing formatting in the LSP sounds good.

We don't currently have any kind of lexing phase in Smithy since we just pull one character at a time from the IDL text to parse it. It wouldn't be difficult to add a lexing phase to the parser, and that would likely make it easier to implement basic error recovery (something I think we need for IDEs). What kind of tokens would you need for it to be useful? The tokenizer I'd imagine though would emit tokens like this: SPACE, BR, COMMA, COMMENT, DOC_COMMENT, AT, STRING, TEXT_BLOCK, COLON, IDENTIFIER, DOT, POUND, DOLLAR, NUMBER, LBRACE, RBRACE, LBRACKET, RBRACKET, LPAREN, RPAREN, EQUAL, EOF, ERROR

Jan 19 '22 22:01 mtdowling

Having the tokens you listed would be amazing already, but I'm gonna go ahead and ping @keynmol, who's done most of the the LSP work for his opinion.

Jan 20 '22 16:01 Baccata

👋

One thing I'd note is that KEYWORD is missing from that list - is that intentional?

I'm not a formatting/parser person (neither is Oli, or he's hiding it well), so tentatively we think that a token stream should be enough for basic formatting - I looked over at how Clang formats C/C++ and it seems to not be reconstructing the ADT, but operating on token streams only.

So if it's possible for C, it should be possible for Smithy :)

Jan 25 '22 09:01 keynmol

Smithy doesn't really have keywords since any kind of keyword would be contextual (e.g., "false", "true", and "null" are special identifiers but only in node values). For example, the following model is valid:

integer integer

(IDENTIFIER, SPACE, IDENTIFIER, BR)

And so is the following (though it would be horrendous):

integer true

(IDENTIFIER, SPACE, IDENTIFIER, BR)

As for formatting based on tokens -- yeah I was wondering the same thing. I'll take a look at Clang Format (this code I think).

Another option that would be a kind of middleground would be to expose a kind of parse tree that attaches comments to AST nodes, but still would be lossy for spaces and other formatting. That would be fine for a formatter since those need to be opinionated anyways. This parse tree could be used as Smithy's IDL parser too; it would get further transformed in the loading process into Smithy's existing semantic model. The only use case it doesn't address is some kind in-place transformation, but maybe those can also be done just using the lexer (e.g., a use case might be removing all commas but leaving everything else the same).

Jan 28 '22 01:01 mtdowling

To add onto this request, for our team it'd be ideal if Smithy could be configured to fail the built if the formatting isn't correct so that developers are forced to format the code before committing and merging it.

Mar 20 '23 21:03 crowecawcaw

We shipped a formatter as part of the Smithy CLI:

❯ smithy format --help
Usage: smithy format [--help | -h]
                     [--debug] [--quiet] [--no-color]
                     [--force-color] [--stacktrace]
                     [--logging LOG_LEVEL] [<MODEL>]

Formats Smithy IDL models.

    --help, -h
        Print this help output.
    --debug
        Display debug information.
    --quiet
        Silence output except errors.
    --no-color
        Disable ANSI colors.
    --force-color
        Force the use of ANSI colors.
    --stacktrace
        Display a stacktrace on error.
    --logging LOG_LEVEL
        Set the log level (defaults to WARNING). Set to one of OFF, SEVERE,
        WARNING, INFO, FINE, ALL.
    <MODEL>
        A single `.smithy` model file or a directory of model files to
        recursively format.

Examples:
   smithy format model-file.smithy
   smithy format model/

You can also format Smithy models files using the smithy-syntax package. For example:

String model = IoUtils.readUtf8File(filename);
IdlTokenizer tokenizer = IdlTokenizer.create(filename.toString(), model);
TokenTree tree = TokenTree.of(tokenizer);
String formatted = Formatter.format(tree, 120);

This was built on top of a lexer, but it doesn't really matter for this issue now since there's a provided formatter.

Jul 05 '23 20:07 mtdowling

smithy smithy copied to clipboard

Function to format smithy files

smithy
smithy copied to clipboard