scancode-toolkit
scancode-toolkit copied to clipboard
Slow index creation on Python 3.13 under some circumstances
Description
I am currently trying to understand some performance issues inside my own wrapper of SCTK which seems to be limited to the wrapper, but somehow originates from licensedcode.cache.
In this specific case, I use the SCTK API to retrieve the copyrights of a directory. To speed things up, I rely on joblib to spawn four parallel workers, each working on a separate file. This has been working without issues until Python 3.12, but for Python 3.13 I see much slower execution times the first time SCTK is called after the installation, id est most likely when the license index is being built.
During my testing, I observed that Python 3.12 would take about 12-13 seconds for the whole execution, while on Python 3.13 the first call would take nearly 4 minutes (and the second call 12 seconds). In some cases (involving a subprocess call) I would even run into lock file timeouts on Python 3.13 during index generation, id est the index taking more than 6 minutes to build.
How To Reproduce
Unfortunately, I cannot give a standalone example here, but only a reference to https://github.com/stefan6419846/license_tools/tree/parallel which contains the offending code.
System configuration
- What OS are you running on? (Windows/MacOS/Linux) - Linux
- What version of scancode-toolkit was used to generate the scan file? - 32.2.1
- What installation method was used to install/run scancode? (pip/source download/other) - pip
@stefan6419846 thank you for reporting this issue.
From https://www.python.org/downloads/ python 3.13 seems to be in prerelease still, which is why we still have not started testing SCTK with python 3.13 and this is also not a supported python version for now.
It's nice to see you're able to atleast use SCTK with python3.13 without failing though, and we will update this issue once therre are stable releases of python3.13 and we start testing SCTK with the same. Usually it takes a while as we also need all our dependencies (specifically pyahocorasick, lxml, intbitset etc which are not pure-python) to start building wheels for python3.13, to release SCTK archives.
I am aware that Python 3.13 is still only available as a release candidate, but the release managers recommend library/package maintainers to start testing compatibility at least when the first RC is available. As I am maintaining a package/library built upon SCTK and SCTK has a library interface as well, I already tested basic compatibility (although starting in earlier stages already, while I waited with reporting this here until I have been able to verify this on the RC as well). From my experience, installing SCTK on a mostly basic Ubuntu 22.04 with Python 3.13 works without any real issues - including compiling the binary dependencies.
@stefan6419846 now that we support python3.13 with https://github.com/aboutcode-org/scancode-toolkit/pull/4430 explicitly with all the native dependencies also released and used, could you check if this issue still exists?
@AyanSinhaMahapatra Yes, I am still seeing issues: Python 3.13 takes about 4 minutes, Python 3.12 about 1.5 minutes, to run the full test suite in this specific case.
@stefan6419846 will look into it soon, does not seem like something which is reflected directly in our CI from the first glance, like in https://dev.azure.com/nexB/scancode-toolkit/_build/results?buildId=16563&view=logs&jobId=98269aaf-545e-5e24-84bd-2aa29b2a994f&j=b2e9c6d6-ac68-5921-c2b9-f4d9ac82ff5e in any of the support OSes. This could be something specific about your usage, might have to look into the details there.
@AyanSinhaMahapatra Okay, after adapting my code and tests to SCTK 32.4.0, it seems like Python 3.13 now runs fast, while Python 3.9 now got noticeably slower. For me, this issue is resolved nevertheless, as Python 3.9 will become EOL in about four months anyway.
@stefan6419846 the issue seems to be availability of python version specific wheels IMHO. We're currently missing python 3.9 wheels at https://pypi.org/project/scancode-toolkit/32.4.0/#files because of a release issue, we're updating this manually from the artifacts. Previously we didn't have python 3.13 wheels because they were unsupported and unreleased as a wheel. If you don't have python version specific wheels then it is built from source and this possibly takes more time.
Updating the issue title and closing, please reopen as necessary.