treefmt icon indicating copy to clipboard operation
treefmt copied to clipboard

Regression in 2.0.5 -> 2.1.0: Getting Error: failed to open cache: timeout

Open terlar opened this issue 1 year ago • 6 comments

Describe the bug

I just tried to update to 2.1.0 from 2.0.5. But now I start getting

Error: failed to open cache: timeout

I get this issue every time I try to run treefmt via pre-commit hook, if I don't have any cache. I have not been able to reproduce it running treefmt directly. It seems to be because pre-commit is batching calls to treefmt and there is some issue running multiple treefmt in parallel and accessing this database?

Of course it is fixable with --no-cache. But would be nice if I could use the cache when it is available.

To Reproduce

Steps to reproduce the behavior:

  1. Git clone https://github.com/terlar/pre-commit-treefmt-bug.git
  2. Enter project directory
  3. Run nix develop
  4. Run rm -r ~/.cache/treefmt
  5. Run pre-commit run --all-files

Expected behavior

Running without failing with a timeout. Perhaps retry? Or add possibility to extend timeout or find another clever solution to the problem.

System information

Linux

Additional context

Only happens together with pre-commit or potentially also if you batch treefmt calls in other ways. I triggered this error with 100 files on one machine, but had to bump it on another one to trigger it. So it could potentially be that you cannot reproduce this depending on your hardware.

terlar avatar Nov 11 '24 11:11 terlar

I'm pretty sure this was introduced by #409.

terlar avatar Nov 11 '24 11:11 terlar

After some research, perhaps the pre-commit config require_serial: true is what should be used instead. This will leave the parallelization to treefmt. However, could potentially happen in multiple calls, if the filenames cannot fit in one invocation. But perhaps that is not a big issue.

Testing with this configuration the formatting was also faster. So seems relying on pre-commits parallelization slowed things down slightly.

terlar avatar Nov 11 '24 12:11 terlar

Using --no-cache is also considerable faster. So leaving out the cache seems to work well both with serial and non-serial execution. At least in this case. So perhaps when running as a pre-commit hook cache should be disabled?

terlar avatar Nov 11 '24 12:11 terlar

In this instance, --no-cache is the simplest way of guaranteeing that multiple instances of treefmt running at the same time do not conflict. However, on larger repos, the loss of the cache could have quite an impact.

require_serial is a good idea, but as you said, there can still be concurrent invocations.

We could make the connection timeout for the db configurable. It's currently set to 1s. In a pre-commit hook, this could be increased to 10s for example, and would ensure any concurrent invocations are queued.

brianmcgee avatar Nov 12 '24 08:11 brianmcgee

Is it worth discussing how to make parallel usage of treefmt work better? A couple ideas:

  1. I see that bbolt has a sharable, read-only mode: https://github.com/etcd-io/bbolt?tab=readme-ov-file#read-only-mode. We could rework things so treefmt grabs a shared connection to check if files are formatted, then releases the connection, formats, and finally grabs a write connection just to update the db.
  2. My preference: the filesystem is already a good enough db for us! Just create a file with the right "signature" if a given file has been formatted. If it exists, there's nothing to do. If it doesn't exist, then format the file and create the "signature" file.

Both these ideas would create scenarios where parallel treefmt processes could end up doing duplicate work (i.e.: formatting the same file), but I'm pretty sure the use case described above is one where people are "intelligently" using treefmt in parallel (not giving the various treefmt incantations overlapping work).

jfly avatar Nov 14 '24 13:11 jfly

I don't know if parallel execution is a path we should go down 🤔

brianmcgee avatar Nov 14 '24 14:11 brianmcgee