alex icon indicating copy to clipboard operation
alex copied to clipboard

Create output file atomically

Open sol opened this issue 1 year ago • 3 comments

This change is useful if you both (a) run alex on modifications to .x-files and (b) :reload GHCi on modifications to .hs-files. It prevents that your GHCi session will see partial source files.

As things are you will initially get an error of the form:

src/Lexer.hs:1:1: error:
    File name does not match module name:
    Saw     : ‘Main’
    Expected: ‘Lexer’
Failed, no modules loaded.

(as initially the output file is empty)

Considerations:

  • [ ] If we go with this then we will need to drop support for GHC < 6.10.1 (as we don't have compat code for bracket and I'm not eager to burn cycles on that).
  • [ ] We could do the same thing for .info-files (personally however, .hs-files is what I care about).
  • [ ] We could guard this behind a flag (e.g. --atomic-output or something).

Thoughts?

sol avatar Feb 06 '23 09:02 sol

@sol, thanks for the PR!

  • If we go with this then we will need to drop support for GHC < 6.10.1

I don't see a problem with dropping GHC versions < 7 because we anyway do not test them. CI goes back to 7.0 only.

  • We could do the same thing for .info-files (personally however, .hs-files is what I care about).

This could be good for code uniformity. At least if there is similar code in other places, it should be changed in sync.

  • We could guard this behind a flag (e.g. --atomic-output or something).

I don't think that would be necessary.

One question: Can the memory footprint of alex increase significantly by doing the write atomically? (Not sure how relevant this question is, but it came to my mind and you may have an answer to this. It could increase if e.g. the writing was triggering a lazy computation of the output, and this now doesn't work like this anymore.)

andreasabel avatar Feb 06 '23 09:02 andreasabel

One question: Can the memory footprint of alex increase significantly by doing the write atomically?

No, it doesn't. This patch creates a temporary file and later renames it. For that reason, the memory characteristics are unchanged.

The downside of this approach is that we create that temporary file on the file system and trigger corresponding inotify events, which I don't really like. The pros are that this is a common and well understood approach to guarantee atomicity when creating files + it's easy to implement without the risk of unintentional side effects.

An other approach would be to construct a ByteString (or rather a builder I guess) in memory first, and finally write that to a file. While this is not atomic it's likely fast enough so that you can get away with it + you don't get the additional inotify events for the temporary file, which I would prefer. In this case you will need ~0.5mb for the builder (for a parser of the size of what GHC uses; possibly more due to the builder overhead not sure) which I think in itself is not a big issue, but you also run the risk of changing the memory characteristics unintentionally and may need to seq things away.

A third and somewhat half measured approach would be to try do a little bit more work upfront (before opening the output file), so that the output file stays empty for a shorter period of time. This may or may not be enough to get away with it. I haven't looked into this at all.

sol avatar Feb 06 '23 10:02 sol

@andreasabel before moving forward with this, I will dogfeed it for a little bit longer.

sol avatar Feb 09 '23 01:02 sol