radon icon indicating copy to clipboard operation
radon copied to clipboard

Radon struggles to collect statistics from large files

Open Sam152 opened this issue 2 years ago • 2 comments

I have some large files that radon struggles to analyse. I created an example to demonstrate the problem: https://gist.githubusercontent.com/Sam152/50e8ef27cceb899084b42a069237a7b8/raw/bb21870395df86a0062c22353b532b45d31bd3f5/sample.py (~800 lines)

In my case running radon raw big-package takes 28.38s. In reality the module I'm trying to analyse has ~ 5000 lines with a similar amount of AST per line.

If I double my 800 line example, the script takes roughly 115.50s to run, so my feeling is that there might be something which scales worse than O(n) per-AST.

Any pointers if there might be something that can be optimised here, or if the nature of the analysis is such that speeding this process up is simply not possible?

Thanks in advance, if anyone can share their experience.

Cheers, Sam


On a side note, while researching this issue, I found radon cited in an academic paper, which I thought was interesting and worth sharing (https://arxiv.org/pdf/2007.08978.pdf).

Sam152 avatar Aug 10 '21 09:08 Sam152

Hi Sam, thanks for sharing the example. Indeed, it's quite surprising to see such a long run time for such a simple file.

The raw command is definitely the slowest, and that's because it does not use the ast module to parse the file, instead it uses tokenize. The latter is written in pure Python instead of C, so that's already a slowing factor. Moreover, when parsing the AST we can use efficient techniques like the visitor pattern, which are not available with the tokenize module.

However, the superlinear complexity is definitely in Radon's code. It performs some complicated operations to count logical lines, and I suspect that's where the slowest code is. I think your example highlights one of the inefficiencies particularly well.

The next step would be to profile the code. A flamegraph should already give some very useful hints. I'll try to investigate this when I've got time.

rubik avatar Aug 26 '21 08:08 rubik

Thanks for the info, that's really helpful!

Sam152 avatar Sep 03 '21 03:09 Sam152