Regression in 2.0.5 -> 2.1.0: Getting Error: failed to open cache: timeout
Describe the bug
I just tried to update to 2.1.0 from 2.0.5. But now I start getting
Error: failed to open cache: timeout
I get this issue every time I try to run treefmt via pre-commit hook, if I don't have any cache. I have not been able to reproduce it running treefmt directly. It seems to be because pre-commit is batching calls to treefmt and there is some issue running multiple treefmt in parallel and accessing this database?
Of course it is fixable with --no-cache. But would be nice if I could use the cache when it is available.
To Reproduce
Steps to reproduce the behavior:
- Git clone
https://github.com/terlar/pre-commit-treefmt-bug.git - Enter project directory
- Run
nix develop - Run
rm -r ~/.cache/treefmt - Run
pre-commit run --all-files
Expected behavior
Running without failing with a timeout. Perhaps retry? Or add possibility to extend timeout or find another clever solution to the problem.
System information
Linux
Additional context
Only happens together with pre-commit or potentially also if you batch treefmt calls in other ways. I triggered this error with 100 files on one machine, but had to bump it on another one to trigger it. So it could potentially be that you cannot reproduce this depending on your hardware.
I'm pretty sure this was introduced by #409.
After some research, perhaps the pre-commit config require_serial: true is what should be used instead. This will leave the parallelization to treefmt. However, could potentially happen in multiple calls, if the filenames cannot fit in one invocation. But perhaps that is not a big issue.
Testing with this configuration the formatting was also faster. So seems relying on pre-commits parallelization slowed things down slightly.
Using --no-cache is also considerable faster. So leaving out the cache seems to work well both with serial and non-serial execution. At least in this case. So perhaps when running as a pre-commit hook cache should be disabled?
In this instance, --no-cache is the simplest way of guaranteeing that multiple instances of treefmt running at the same time do not conflict. However, on larger repos, the loss of the cache could have quite an impact.
require_serial is a good idea, but as you said, there can still be concurrent invocations.
We could make the connection timeout for the db configurable. It's currently set to 1s. In a pre-commit hook, this could be increased to 10s for example, and would ensure any concurrent invocations are queued.
Is it worth discussing how to make parallel usage of treefmt work better? A couple ideas:
- I see that bbolt has a sharable, read-only mode: https://github.com/etcd-io/bbolt?tab=readme-ov-file#read-only-mode. We could rework things so treefmt grabs a shared connection to check if files are formatted, then releases the connection, formats, and finally grabs a write connection just to update the db.
- My preference: the filesystem is already a good enough db for us! Just create a file with the right "signature" if a given file has been formatted. If it exists, there's nothing to do. If it doesn't exist, then format the file and create the "signature" file.
Both these ideas would create scenarios where parallel treefmt processes could end up doing duplicate work (i.e.: formatting the same file), but I'm pretty sure the use case described above is one where people are "intelligently" using treefmt in parallel (not giving the various treefmt incantations overlapping work).
I don't know if parallel execution is a path we should go down 🤔