genemichaels
genemichaels copied to clipboard
Format incomplete rust files/snippets (not just full documents)
Using genemichaels with --stdin is useful for formatting parts of files. Partial code is not always valid Rust code on its own even if valid syntactically.
Can you give me an example?
IIUC, what genemichaels does in order to format code is
- use
synto parse the text as aFileAST - Convert the AST into a formatting structure
- Generate text from the formatting structure
So if you're getting an error I think it's probably because of 1: genemichaels tells syn to parse the text as a File but what you have isn't a full file, but rather an expression or something?
If it's always a full expression, I might be able to add something like --expr so that syn parses it as an expression rather than a full file. Alternatively, genemichaels-lib has alternative methods to format a string or format an AST, so you could write your own code to do e.g. let ast = syn::parse::<Expr>(text) then call genemichaels_lib::format_ast(ast).
You're right that I could make my own formatter! I'll look into that. Just for completeness: Sometimes I'd like to have the following:
match value { // function indentation
0 => { // no indentation
// very long block
}
...
130 => { ... } // still no indentation
} // function indentation
It's just a temporary thing for editing, but obviously formatting the whole file means I have to do it over. It'd be weird, I think, to ask for a feature like "modular indentation" but partial formatting would cover this usecase nicely, among perhaps several others.
So yeah, you got it: parsing as an Item, Expr or similar might work in that case. The most flexible would obviously be TokenStream, for which you have to use heuristics to make formatting choices, but genemichaels' deterministic nature probably makes that easy.
What do you think? Is genemichaels only for formatting complete files or would you ideally enable partial formatting?
Are you saying you want to format everything except that part in the middle? There's #71 ~but I think that was more about the opposite, formatting only a specific section while leaving the rest although I think they're probably similar problems.~ I think #71 covers this use case - it's "1." in that issue.
I can add a "--expr" flag but I don't think it'll help do the above (avoid indenting the middle) unless that was a separate use case.
either partial formatting or a way to tell genemichaels "don't format this!"
After looking at how rustfmt does it, I think it'd be best to add genemichaels support for #[rustfmt::skip]. This would also make it easier for existing codebases to switch over to genemichaels.
That's a good point. I'll split #71 maybe to make this more clear. TBH I think not formatting a chunk is easier than the opposite, formatting only a chunk, so I might be able to work on it sooner.
TBH I think not formatting a chunk is easier than the opposite, formatting only a chunk, so I might be able to work on it sooner.
That was my realization yesterday. I attempted to get to work on it, but could not figure out the code model. I got as far as parsing out the attribute path for items and then forcibly treating the segment as whitespace (after the attributes), then accidentally lost some comments 😅
That's pretty far!
Just to clarify: Partial formatting is different from skipping a formatted section. Instead of manually formatting a piece of text and skipping it, partial formatting is selecting a piece of text and formatting it, even if syntactically invalid. So you were right to keep this issue open in spite of #115.
Yeah, right. I think it's a fairly different problem... I haven't actually tried it so maybe there's a clean way to do it but at first glance it seemed tricky. I feel like you could end up spending a lot of time tweaking this and it's not really a core use case.
I have come up with a solution. Macro invocations are already formatted token by token. Sometimes in my code I do:
formatting! {
0 => { /* ... */ }
1 => { /* ... */ }
}
Once it has been formatted, I remove the macro invocation wrapper.
Proposal: Add a --tokens flag that automatically wraps stdin input in a macro invocation, formats it, then emits only the macro body to stdout. This avoids the complexity of trying to parse incomplete code as specific AST node types, and would solve the partial/incomplete formatting use cases and related issues by leveraging genemichaels' existing token-based macro formatting.
I wrote a naive implementation that doesn't take indentation into account. Will see what can be done.
I implemented it such that only the input indentation is tracked. This indentation is assumed to be correct, ie the input is assumed to start on its own line, but what indentation there is gets added to each line of the output. I think this is about the best we can do without more information. I am simply putting the responsibility on the user to ensure proper indentation, while allowing genemichaels to do the best it can with what it is given.
You can take a look at the commits at https://github.com/LnxgpI0k/genemichaels/commit/bd5d92add434172b1b32b1a907b1d17fd30b3baf and https://github.com/LnxgpI0k/genemichaels/commit/43edd47600efd79f894d50b42b96446744ab3cff
Oh nice, yeah I see. So basically wrapping it into something that's okay at the root. And that's a good point, macros already try a couple contexts to get reasonable parsing results.
I was thinking about this more and it's also something that's (I think) needed for rust code blocks in markdown if I ever get to that. I think there's fundamentally only two contexts that need to be handled: the root of a file (what's currently parsed) and a function body context (since you can't do let x = or match or etc at the root of a file).
I can add a context parameter maybe that allows forcing function-body context or an "auto" mode that tries both in sequence. I think by default it should only parse the root of a document, because "auto" behavior can be unpredictable and obscure issues (like maybe something gets formatted unexpectedly because it detects the wrong behavior).
Do you think my method is also too unpredictable? Also splitting it into its own arg switch works because I think language servers have a separate config for partial formatting.
After rereading your last comment, it sounds like the idea is to be more correct about what the formatter is doing. If you want, I can prototype something that tries different contexts, even letting the user select a context. I mean, it's as simple as wrapping the input in a function header or something 😁 Whatever you decide.
Yeah, it was what you were suggesting (more or less) but just without injecting text to fake the context.
I didn't think about editors and partial formatting... I mean, I feel like I remember seeing a setting for "only format changed lines" once in the past, but I guess that'd have to be a different setting. I have no idea how that'd work with indentation though.
What I was thinking of, aside from the config enum, was basically... the current code does https://github.com/andrewbaxter/genemichaels/blob/0ac26108bbc022ad46dfb51fcd77c86ac7faf7d2/crates/genemichaels-lib/src/lib.rs#L545-L548
So the change is depending on the enum it'd do syn::parse2::<Vec<item>> (although not Vec, it'd be however the content of a function body is defined, probably a delimited list thing, I can't remember the name) instead of parse2::<File>. If there are a couple contexts we want to try we can try them in sequence and go with whichever one parses without error first (or else return all the errors).
That's roughly what happens in macro formatting too.
But I think with editor "only changed lines" formatting handling whitespace is probably tricky. And I'm not entirely sure it makes sense as use-case here. Because this is deterministic, formatting the file will (by definition) only change the stuff near the commit footprint, so I feel like this adds complexity and failure points (detecting correct indentation, line wrap width, changes that implicate other surrounding code leading to nondeterministic formatting over multiple changes).
If there's a clear use case for this I think it's probably manually formatting snippets for some sort of custom documentation maybe... markdown files, websites, etc.
My usecase was actually temporary editing. But it's fine: I can just do the macro wrapping hack in a bash script or something, then use that command in my lsp config. I might do that lol
Still the best formatter for Rust!
What's temporary editing?
And thanks, I'm really glad to hear it's appreciated 😄! Thanks for using it!
What's temporary editing?
So, I don't use formatters primarily to "prettify" the code: I use them to make editing easier. I like genemichaels because things line up more often than with ~~rustfmt~~ (plus has the opinions I like). I don't usually care about style, I want to conserve screen space (which is why I use 3 spaces instead of 4 for indent) and present the code as consistently as possible so I can spot repetition.
So, temporary editing... lol kinda a dumb way to phrase it, my bad, but essentially just formatting stuff so it lines up in the same columns, braces don't span multiple lines, and perhaps using less indentation. I usually achieve this by formatting code I will not keep made of short expressions with no braces- thus making a scaffold on which to write code. Afterwards, I normalize the entire document so it's easier to read and audit, and then I can denest things if I want.