refurb
refurb copied to clipboard
Enhancement: Compile using mypy
Hi again,
So I've noticed that refurb is pretty slow compared to other tools. I would guess its because its using mypy and is not itself compiled. I would therefore suggest investing some time into setting up a compilation using mypyc
- this shouldnt be hard to do given the fact that codebase is well typed. The annoying and complex part is going to be the github pipeline to test this in different environments and the release of the wheels. But you should see a x2-x4 performance increase out of the box according to mypyc
.
This is on my todo list, and something that I hope to play around with by the end of next week. I'll make sure to report back here on any progress I make!
Looks like there is a long road ahead for mypyc:
$ mypyc refurb/checks/pathlib/with_suffix.py
refurb/checks/pathlib/with_suffix.py:42: error: Match statements are not yet supported
There is currently an open issue in the mypyc repo for this. I wouldn't mind taking a crack at implementing this, might be fun who knows!
At the very least we could mypyc
ify all the non-match related code, though most of the meat of this project lies in the checks.
I'd wait on your shows. Or change the local implementation
Looks like there is a long road ahead for mypyc:
$ mypyc refurb/checks/pathlib/with_suffix.py refurb/checks/pathlib/with_suffix.py:42: error: Match statements are not yet supported
There is currently an open issue in the mypyc repo for this. I wouldn't mind taking a crack at implementing this, might be fun who knows!
At the very least we could
mypyc
ify all the non-match related code, though most of the meat of this project lies in the checks.
So, the problem for me as a user is that refurb is currently extremely slow on large code bases - significantly more than mypy itself, because it's not compiled.
I would suggest improving two points- 1. Compilation and 2. Ignoring all imports in a better way, even those not missing.
Progress is under way! I have an open PR for adding match support to Mypy (see https://github.com/python/mypy/pull/13953), and I recently added a --disable-all
flag which will disable all checks, thus not loading them. In that case, you would have to explicitly enable the checks you want.
Does that help answer your question? Thanks!
Progress is under way! I have an open PR for adding match support to Mypy (see https://github.com/python/mypy/pull/13953), and I recently added a
--disable-all
flag which will disable all checks, thus not loading them. In that case, you would have to explicitly enable the checks you want.Does that help answer your question? Thanks!
Hi, the issue is not the checks, it's the amount of time it takes in traversing types. I would suggest to try and ensure it never enters library code at all, unlike mypy. I don't know if it's possible, but if it is I'm sure it will speed up things.
Hi @Goldziher,
https://github.com/python/mypy/pull/13953 has been merged, and progress is well underway to get Refurb compiling with Mypy! There are some additional roadblocks which will need to be resolved, but for the most part, we are much closer then we where before.
As for your comment about speeding things up by not checking library code, how were you bench marking that? I don't doubt your methods, I'm just curious how you came to that conclusion (as I have yet to look into this for myself). There are probably ways to prevent the library code from being traversed, though I wonder how much type information (if any) would be able to be gathered.
Hi @Goldziher,
python/mypy#13953 has been merged, and progress is well underway to get Refurb compiling with Mypy! There are some additional roadblocks which will need to be resolved, but for the most part, we are much closer then we where before.
As for your comment about speeding things up by not checking library code, how were you bench marking that? I don't doubt your methods, I'm just curious how you came to that conclusion (as I have yet to look into this for myself). There are probably ways to prevent the library code from being traversed, though I wonder how much type information (if any) would be able to be gathered.
Hi, i haven't benchmarked this. I simply noticed that the moment I set the missing dependencies in the "additional_dependencies" list in the pre-commit config, checking became excruciatingly slow.
For now btw I removed refurb from our pipelines - it simply takes too long to run on our code base. I'll certainly give it another go once its faster.
I completely understand removing Refurb for the time being, it is a bit slower then I would like. For the time being I will look into ways to speed things up without using compilation, and add benchmarks so that we can track the speed of Refurb over time!