sqlfluff icon indicating copy to clipboard operation
sqlfluff copied to clipboard

Don't re-fix already fixed files

Open fdw opened this issue 3 years ago • 6 comments

Search before asking

  • [X] I searched the issues and found no similar issues.

Description

Let's say I want to automatically call sqlfluff fix before each commit. I could do so, but then sqlfluff will work on all files in the directory, not only on the ones that changed. Since, ideally, most of the unchanged ones are perfectly linted, this wastes a lot of time.

Instead, I could imagine that sqlfluff keeps a cache somewhere with hashes of already linted files. Then it could skip previously seen ones, making the process much faster.

Use case

I want to speed up sqlfluff fix, for example by ignoring previously linted files.

Dialect

This is independent of the dialect

Are you willing to work on and submit a PR to address the issue?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

fdw avatar May 05 '22 09:05 fdw

A pre-commit hook will only look at changed files. Or at least changed files in that branch only. Similarly CI (e.g. GitHub Actions) can check only changed files.

One concern with your proposal of not checking changed files, is what if SQLFluff itself is changed? E.g. if there's a new version, or you change your .sqlfluff configuration file?

tunetheweb avatar May 05 '22 09:05 tunetheweb

Thanks for that pointer! :) That would help with my scenario.

However, if I don't want to rely on a commit hook (maybe because I want to run it manually in between or when I don't have git), then it doesn't really help me.

Of course, the cache would need to take the config and the sqlfluff version into account. Probably also some --force parameter.

If that's not something you want to support, that's fine :)

fdw avatar May 05 '22 10:05 fdw

@fdw this is how i do it sqlfluff fix $(git diff origin/main... --name-only && git diff --name-only)

This only fixes things that have not been staged, or changes relative to main from your branch, not main's

WittierDinosaur avatar May 05 '22 14:05 WittierDinosaur

We do have a --new-only option for generate_parse_fixture_yml.py. Now that's something easier for them (we check YML timestamps, versus SQL timestamps).

But maybe could write a file, and then check that timestamp in similar manner?

I am concerned about changes to sqfluff versions/config files/dependent files like dbt models...etc. that might make that trickier though. But maybe that's up to user to not use the --new-only flag in those cases?

Either way, while I understand the problem and desire to solve this, there are work arounds described above, and not aware of any other linter having this functionality (correct me if I'm wrong), so don't think it's the highest priority here unless you want to take it on yourself @fdw ?

tunetheweb avatar May 05 '22 14:05 tunetheweb

so don't think it's the highest priority here unless you want to take it on yourself @fdw ?

Totally fair :) However, I'm not deep enough into sqlfluff yet and I'd like to fix some other features/bugs I've opened before I can even think about tackling this one. So, if you don't mind, we can just keep it open :)

fdw avatar May 06 '22 05:05 fdw

This would be excellent, many other linting/formatting frameworks also support file caching. Prettier and ESLint are two I think of immediately.

reteps avatar May 06 '25 04:05 reteps