pytype icon indicating copy to clipboard operation
pytype copied to clipboard

pytype cache is impossible to cache in CI

Open alexey-medvedchikov opened this issue 2 years ago • 9 comments
trafficstars

Hi,

First of all let me say a big thanks for your work. After we introduced pytype into our development pipeline the number of problems dropped dramatically!

One thing that we noticed is the speed of execution. A clean run of pytype on our code base takes around 3 min. The cache is helping a lot, a typical branch run when using cache is just 5 sec. I tried to store pytype cache in the CI pipeline, but the problem is that it is ninja cache under the hood. And ninja is famous for its poor caching behaviour: it is using file creation time to timestamp artefacts. So when I restore cache, instead of checking if files changed (by hash for example) it automatically assumes that every cache item is expired and runs everything.

The file that stores information needed for caching (timestamps, names of the files, etc) has complex format and investing work in working around it to make sure that cache works after restoration is a hard task. ~~There's no way to create a workaround for this, because user doesn't have control over ctime on Linux, only mtime is available for change.~~ And, while 3 min doesn't sound a lot, it is annoying and expensive - this step is configured with 8-core build container.

Can you please give any advice? Maybe someone already made a better caching wrapper with pytype-single under the hood?

Thanks in advance.

alexey-medvedchikov avatar Mar 17 '23 12:03 alexey-medvedchikov

Having the same problem here, would really appreciate some more info or a workaround. Using pytype in CI is a major usecase, and for large codebases running without a cache is not an efficient use of resources.

ramonziai avatar Mar 20 '23 10:03 ramonziai

we will explore alternatives to ninja. do you know of any that have better caching behaviour?

martindemello avatar Mar 20 '23 17:03 martindemello

somewhat clunky method to change a file's ctime: https://stackoverflow.com/questions/16126992/setting-changing-the-ctime-or-change-time-attribute-on-a-file/17066309#17066309

martindemello avatar Mar 20 '23 19:03 martindemello

we will explore alternatives to ninja. do you know of any that have better caching behaviour?

Thanks for taking a look at this! I don't have much practical experience here, but from a bit of research SCons, Please and waf seem to be some good candidates. I found them in this reddit thread.

ramonziai avatar Mar 21 '23 07:03 ramonziai

Ninja uses mtime. (poke around in https://github.com/ninja-build/ninja) regardless, hacking timestamps to appease another build system not designed for a shared cache used on CI purpose doesn't sound like a sustainable approach.

gpshead avatar Mar 21 '23 23:03 gpshead

we will explore alternatives to ninja. do you know of any that have better caching behaviour?

@martindemello Any news on this, or ways in which we can help?

Just had a look at the ninja repo, and they have open PRs dating all the way back to 2014, so I wouldn't be too optimistic that things will change in ninja itself any time soon...

ramonziai avatar May 03 '23 07:05 ramonziai

bazel/blaze also use hashes instead of mtime

mcmanustfj avatar Mar 27 '24 16:03 mcmanustfj

@ramonziai sorry, i managed to miss your comment from 2023! if you want to help what we need is a replacement for ninja that uses hashes rather than timestamps, and whose makefile equivalents are easy to generate from cmake. unfortunately the team has a lot on its plate right now and we do not have the bandwidth to explore the alternatives ourselves, or to write the code generation in cmake.

@mcmanustfj bazel would be an option too, but again it would need the work to scan a project and generate bazel files to run pytype over it

we can help with advice and pointers if anyone wants to volunteer to do the work, but we will not be able to do it ourselves.

martindemello avatar Mar 27 '24 18:03 martindemello