PyCG <Performace Bug>: It costs too much time to analyze DL packages(e.g., numpy)

<Performace Bug>: It costs too much time to analyze DL packages(e.g., numpy)

Open AllenSun1024 opened this issue 3 years ago • 3 comments

Hello, it costs me tooooo much time to analyze deep learning packages(numpy, tensorflow, etc.), which is unacceptable. Do you have any idea about optimizing PyCG to reduce its running time?

As you mentioned here, it's likely to improve the complexity. Can you give me some suggestions on how to optimize DefinitionManager.complete_definitions()? (I don’t mind sacrificing a certain precision in exchange for significantly less execution time.)

I'm looking forward to receiving your constructive suggestions. Thanks!

Jan 04 '22 03:01 AllenSun1024

Optimizing this part is a work in progress. It basically implements a transitive closure of the assignment graph. However, this can be implemented in a lazy manner -- i.e. whenever we look for the functions that can be pointed to by a certain identifier, we can update the assignment graph with new edges towards the results.

I have also added the --max-iterations CLI argument which limits the fix-point iteration to a certain number of iterations. The quickest way to improve performance with a very small sacrifice in precision & recall would be to use this argument with a numerical value (e.g. --max-iterations 1.

Jan 04 '22 07:01 vitsalis

Thanks a lot! Your reply is quick and quite useful!

Jan 04 '22 08:01 AllenSun1024

Optimizing this part is a work in progress. It basically implements a transitive closure of the assignment graph. However, this can be implemented in a lazy manner -- i.e. whenever we look for the functions that can be pointed to by a certain identifier, we can update the assignment graph with new edges towards the results.

I have also added the --max-iterations CLI argument which limits the fix-point iteration to a certain number of iterations. The quickest way to improve performance with a very small sacrifice in precision & recall would be to use this argument with a numerical value (e.g. --max-iterations 1.

Hi,

When I set --max-iterations as 1, the execution time is much less and I appreciate it very much!

However, consider the following code: label_binarizer = LabelBinarizer() image_labels = label_binarizer.fit_transform(label_list) PyCG can only extract sklearn.preprocessing.LabelBinarizer while LabelBinarizer.fit_transform can not be extracted.

I'm curious about the reason: Does the small value of --max-iterations contribute to it? or, PyCG doesn't have the ability to extract the call chain? @vitsalis

Jan 12 '22 08:01 AllenSun1024

Closing due to archival of repository.

Nov 26 '23 07:11 vitsalis

PyCG PyCG copied to clipboard

<Performace Bug>: It costs too much time to analyze DL packages(e.g., numpy)

PyCG
PyCG copied to clipboard