smithy
smithy copied to clipboard
Function to format smithy files
Hey folks,
My team's putting some work into having a polished editor experience :
- https://github.com/disneystreaming/smithy-language-server
- https://github.com/disneystreaming/vscode-smithy
We were wondering whether there was any existing utility function we could piggyback on to format smithy files. We are aware of the serialiser that outputs smithy syntax, but it doesn't quite cut it because the comments get lost and it requires a semantically valid model. Ideally we'd like to operate the formatter at a syntactic level.
Any idea ?
We don't currently have anything in smithy-model to do standalone formatting. You're right that the serialization in smithy-model loses comments and needs a semantically valid model. Syntactic comments have no formal relationship to shapes, and would only show up in things like parse trees, which we don't use.
Where would formatting happen? In VS code or in the LSP? Or maybe in something like Prettier? I'm wondering where this would be best implemented.
The LSP has an endpoint to apply formatting. Additionally, if we had access to a tokenizer, we could have much more accurate syntactic colouring, provided at the LSP level.
Implementing formatting in the LSP sounds good.
We don't currently have any kind of lexing phase in Smithy since we just pull one character at a time from the IDL text to parse it. It wouldn't be difficult to add a lexing phase to the parser, and that would likely make it easier to implement basic error recovery (something I think we need for IDEs). What kind of tokens would you need for it to be useful? The tokenizer I'd imagine though would emit tokens like this: SPACE, BR, COMMA, COMMENT, DOC_COMMENT, AT, STRING, TEXT_BLOCK, COLON, IDENTIFIER, DOT, POUND, DOLLAR, NUMBER, LBRACE, RBRACE, LBRACKET, RBRACKET, LPAREN, RPAREN, EQUAL, EOF, ERROR
Having the tokens you listed would be amazing already, but I'm gonna go ahead and ping @keynmol, who's done most of the the LSP work for his opinion.
👋
One thing I'd note is that KEYWORD
is missing from that list - is that intentional?
I'm not a formatting/parser person (neither is Oli, or he's hiding it well), so tentatively we think that a token stream should be enough for basic formatting - I looked over at how Clang formats C/C++ and it seems to not be reconstructing the ADT, but operating on token streams only.
So if it's possible for C, it should be possible for Smithy :)
Smithy doesn't really have keywords since any kind of keyword would be contextual (e.g., "false", "true", and "null" are special identifiers but only in node values). For example, the following model is valid:
integer integer
(IDENTIFIER
, SPACE
, IDENTIFIER
, BR
)
And so is the following (though it would be horrendous):
integer true
(IDENTIFIER
, SPACE
, IDENTIFIER
, BR
)
As for formatting based on tokens -- yeah I was wondering the same thing. I'll take a look at Clang Format (this code I think).
Another option that would be a kind of middleground would be to expose a kind of parse tree that attaches comments to AST nodes, but still would be lossy for spaces and other formatting. That would be fine for a formatter since those need to be opinionated anyways. This parse tree could be used as Smithy's IDL parser too; it would get further transformed in the loading process into Smithy's existing semantic model. The only use case it doesn't address is some kind in-place transformation, but maybe those can also be done just using the lexer (e.g., a use case might be removing all commas but leaving everything else the same).
To add onto this request, for our team it'd be ideal if Smithy could be configured to fail the built if the formatting isn't correct so that developers are forced to format the code before committing and merging it.
We shipped a formatter as part of the Smithy CLI:
❯ smithy format --help
Usage: smithy format [--help | -h]
[--debug] [--quiet] [--no-color]
[--force-color] [--stacktrace]
[--logging LOG_LEVEL] [<MODEL>]
Formats Smithy IDL models.
--help, -h
Print this help output.
--debug
Display debug information.
--quiet
Silence output except errors.
--no-color
Disable ANSI colors.
--force-color
Force the use of ANSI colors.
--stacktrace
Display a stacktrace on error.
--logging LOG_LEVEL
Set the log level (defaults to WARNING). Set to one of OFF, SEVERE,
WARNING, INFO, FINE, ALL.
<MODEL>
A single `.smithy` model file or a directory of model files to
recursively format.
Examples:
smithy format model-file.smithy
smithy format model/
You can also format Smithy models files using the smithy-syntax package. For example:
String model = IoUtils.readUtf8File(filename);
IdlTokenizer tokenizer = IdlTokenizer.create(filename.toString(), model);
TokenTree tree = TokenTree.of(tokenizer);
String formatted = Formatter.format(tree, 120);
This was built on top of a lexer, but it doesn't really matter for this issue now since there's a provided formatter.