refurb icon indicating copy to clipboard operation
refurb copied to clipboard

Enhancement: Compile using mypy

Open Goldziher opened this issue 2 years ago • 9 comments

Hi again,

So I've noticed that refurb is pretty slow compared to other tools. I would guess its because its using mypy and is not itself compiled. I would therefore suggest investing some time into setting up a compilation using mypyc - this shouldnt be hard to do given the fact that codebase is well typed. The annoying and complex part is going to be the github pipeline to test this in different environments and the release of the wheels. But you should see a x2-x4 performance increase out of the box according to mypyc.

Goldziher avatar Oct 07 '22 18:10 Goldziher

This is on my todo list, and something that I hope to play around with by the end of next week. I'll make sure to report back here on any progress I make!

dosisod avatar Oct 07 '22 19:10 dosisod

Looks like there is a long road ahead for mypyc:

$ mypyc refurb/checks/pathlib/with_suffix.py
refurb/checks/pathlib/with_suffix.py:42: error: Match statements are not yet supported

There is currently an open issue in the mypyc repo for this. I wouldn't mind taking a crack at implementing this, might be fun who knows!

At the very least we could mypycify all the non-match related code, though most of the meat of this project lies in the checks.

dosisod avatar Oct 08 '22 04:10 dosisod

I'd wait on your shows. Or change the local implementation

Goldziher avatar Oct 08 '22 08:10 Goldziher

Looks like there is a long road ahead for mypyc:

$ mypyc refurb/checks/pathlib/with_suffix.py
refurb/checks/pathlib/with_suffix.py:42: error: Match statements are not yet supported

There is currently an open issue in the mypyc repo for this. I wouldn't mind taking a crack at implementing this, might be fun who knows!

At the very least we could mypycify all the non-match related code, though most of the meat of this project lies in the checks.

So, the problem for me as a user is that refurb is currently extremely slow on large code bases - significantly more than mypy itself, because it's not compiled.

I would suggest improving two points- 1. Compilation and 2. Ignoring all imports in a better way, even those not missing.

Goldziher avatar Oct 30 '22 11:10 Goldziher

Progress is under way! I have an open PR for adding match support to Mypy (see https://github.com/python/mypy/pull/13953), and I recently added a --disable-all flag which will disable all checks, thus not loading them. In that case, you would have to explicitly enable the checks you want.

Does that help answer your question? Thanks!

dosisod avatar Oct 31 '22 01:10 dosisod

Progress is under way! I have an open PR for adding match support to Mypy (see https://github.com/python/mypy/pull/13953), and I recently added a --disable-all flag which will disable all checks, thus not loading them. In that case, you would have to explicitly enable the checks you want.

Does that help answer your question? Thanks!

Hi, the issue is not the checks, it's the amount of time it takes in traversing types. I would suggest to try and ensure it never enters library code at all, unlike mypy. I don't know if it's possible, but if it is I'm sure it will speed up things.

Goldziher avatar Oct 31 '22 08:10 Goldziher

Hi @Goldziher,

https://github.com/python/mypy/pull/13953 has been merged, and progress is well underway to get Refurb compiling with Mypy! There are some additional roadblocks which will need to be resolved, but for the most part, we are much closer then we where before.

As for your comment about speeding things up by not checking library code, how were you bench marking that? I don't doubt your methods, I'm just curious how you came to that conclusion (as I have yet to look into this for myself). There are probably ways to prevent the library code from being traversed, though I wonder how much type information (if any) would be able to be gathered.

dosisod avatar Dec 11 '22 01:12 dosisod

Hi @Goldziher,

python/mypy#13953 has been merged, and progress is well underway to get Refurb compiling with Mypy! There are some additional roadblocks which will need to be resolved, but for the most part, we are much closer then we where before.

As for your comment about speeding things up by not checking library code, how were you bench marking that? I don't doubt your methods, I'm just curious how you came to that conclusion (as I have yet to look into this for myself). There are probably ways to prevent the library code from being traversed, though I wonder how much type information (if any) would be able to be gathered.

Hi, i haven't benchmarked this. I simply noticed that the moment I set the missing dependencies in the "additional_dependencies" list in the pre-commit config, checking became excruciatingly slow.

For now btw I removed refurb from our pipelines - it simply takes too long to run on our code base. I'll certainly give it another go once its faster.

Goldziher avatar Dec 11 '22 07:12 Goldziher

I completely understand removing Refurb for the time being, it is a bit slower then I would like. For the time being I will look into ways to speed things up without using compilation, and add benchmarks so that we can track the speed of Refurb over time!

dosisod avatar Dec 11 '22 23:12 dosisod