treefmt icon indicating copy to clipboard operation
treefmt copied to clipboard

`treefmt --stdin` doesn't work if the underlying formatter cares about the real filename

Open MagicRB opened this issue 1 year ago • 11 comments

Describe the bug

So the default behavior of treefmt, is to to not pass the file path to the downstream formatter if formatting via stdin. But that subtly breaks fourmolu, it fails to pull in the active Haskell extensions. So fourmolu -i -c fat-hs/src/MyLib.hs < fat-hs/src/MyLib.hs works, but fourmolu -i -c < fat-hs/src/MyLib.hs doesn't. I don't quite see from the code, where does the path get omitted, https://github.com/numtide/treefmt/blob/9399cd69f9860bf62fd42479591e0ae39a1b7636/format/formatter.go#L80 I was looking there, and I think it comes from files, which is empty in the case of stdin formatting I guess.

To Reproduce

  1. clone my thesis repo https://git.redalder.org/magic_rb/bsc-thesis
  2. try to format fat-hs/src/MyLib.hs using treefmt, so: treefmt --stdin fat-hs/src/MyLib.hs < fat-hs/src/MyLib.hs

Expected behavior

Formatting is consistent between stdin mode and direct mode

System information

NixOS 25.05, exact versions of everything are pinned in https://git.redalder.org/magic_rb/bsc-thesis Additional context

MagicRB avatar Apr 18 '25 10:04 MagicRB

I haven't checked the code, but I thought treefmt implemented it's --stdin feature by creating a file under the hood. Are you sure fourmolu is actually being passed the file via stdin rather than as a path to a temp file?

jfly avatar Apr 25 '25 23:04 jfly

yeah im pretty sure it does do it by stdin and not tmpfile

MagicRB avatar Apr 26 '25 14:04 MagicRB

@MagicRB, I put together this experiment that shows treefmt creates a tempfile when invoked via --stdin: https://github.com/jfly/2025-04-26-treefmt-stdin-demo.

Note that the treefmt spec doesn't expect formatters to understand how to process files via stdin: https://treefmt.com/main/reference/formatter-spec/.

Is it possible the issue you're running into is that the tempfile that treefmt creates has some odd name, and that is what messes up fourmolu?

jfly avatar Apr 26 '25 22:04 jfly

oh, well, I thought I understood the code. Yeah that won't work with Haskell projects in general. The valid syntax depends on the project the file is in. With fourmolu if --stdin is passed it must be done via stdin and fourmolu pointed to the original file, otherwise it won't work.

MagicRB avatar Apr 28 '25 08:04 MagicRB

I spoke with @MagicRB, and he explained the problem in more detail. I think this is a Haskell/Emacs problem, not a treefmt problem and should be handled in a wrapper script. However, there currently isn't enough context being passed to the formatter to allow for this.

One suggestion, stemming from discussions in #571, is to add a TREEFMT_ARGS environment variable when invoking a formatter.

Normally, invoking treefmt --stdin foo/bar/buzz.hs results in a temp file being created in the tree root and passed downstream to the formatter like formolou temp1232.hs. If we mixed in TREEFMT_ARGS to the environment before executing the formatter, @MagicRB could write a wrapper script which could massage the arguments to keep formolou happy (I think it needs the original path passed to treefmt).

This change could make it easier for other wrapper scripts in future.

If we add this env variable, we can add it to tree-root-cmd too. I can see us extending the env variables in future to provide more context to downstream commands that treefmt is invoking.

Thoughts @jfly @zimbatm @MattSturgeon ?

brianmcgee avatar Apr 28 '25 08:04 brianmcgee

IIUC, this doesn't feel like a Haskell/Emacs problem. I'm not sure I totally grok the specific issue here (is this a monorepo with multiple "subprojects" with different formatting rules?), but it certainly seems reasonable to me that a formatter might need to know the "real" name of the file it's formatting (for enforcing different formatting rules on different files).

IMO this is actually a flaw in treefmt's architecture: the "right way" to handle this would be for us to invoke the formatter by passing the file contents via stdin and the real filename via an arg.

  • As mentioned above, this would require a change to the treefmt spec.
  • This would mean invoking the formatter N times for N files, which could have a performance impact for formatters that have an expensive startup.

jfly avatar Apr 28 '25 09:04 jfly

The TREEFMT_ARGS road means escaping the array of args and pushing them into a string. The all the programs need to understand the escaping algorithm (probably bash escaping) to consume it properly. But it's a pragmatic and simple option.

One thing that stands out to me here is that --stdin original_filepath is the same CLI API as treefmt.

Is that a thing we believe all formatters should support?

Potentially we could publish a --stdin formatter spec, and then add a boolean on the formatter config stdin_spec = true on whenever they support the spec or not.

zimbatm avatar Apr 28 '25 09:04 zimbatm

Is that a thing we believe all formatters should support?

If their behavior could depend on the real filename, then they already support something like this, or they have an issue asking for it, or they aren't really used 😛

Maybe some hard data would help? Would it be helpful for me to go through the list of formatters in treefmt-nix and see if they support stdin?

jfly avatar Apr 28 '25 15:04 jfly

IIUC, this doesn't feel like a Haskell/Emacs problem. I'm not sure I totally grok the specific issue here (is this a monorepo with multiple "subprojects" with different formatting rules?),

As @MagicRB explained to me, in Haskell, you can enable specific language extensions either in the preamble of a file or at the project level (i.e., path-dependent). Hence, Formolou needs to know the original path when it is invoked.

brianmcgee avatar Apr 29 '25 07:04 brianmcgee

@zimbatm, @brianmcgee, nixfmt would benefit from having a stdin spec to follow. We're doing some light bikeshedding over on https://github.com/NixOS/nixfmt/issues/301 ;)

jfly avatar May 05 '25 18:05 jfly

We discussed this today. Here's the plan:

  1. We'll add a "Stdin Spec" to the existing Formatter Spec. In short, it'll be treefmt's cli:
    • If the cli accepts stdin, it must only do so when passed a --stdin flag. When the --stdin flag is used, the cli must take a single positional argument which is the filename of the "virtual file" being formatted.
  2. We'll update treefmt to allow formatters to be marked as supporting the formatter spec. If a formatter supports the stdin spec, then we'll leverage its stdin support when treefmt is invoked with --stdin.
    • TBD if we add a warning if treefmt is invoked with --stdin, but the underlying formatter doesn't support the stdin spec.
  3. On a related note: @brianmcgee mentioned the idea of creating a "acid 3" style formatter test which would check if formatters obey our specs: https://github.com/numtide/treefmt-nix/issues/358

jfly avatar May 13 '25 20:05 jfly