utils.py modularization
Anvi'o has a problems with its anvio/utils.py. Over the years it has grown to house over 150 functions and classes, and at the time of this PR it was over 5,000 lines-of-code. Having everything in one file is a bad idea, not only for performance, but also for the maintainability of the codebase.
This PR splits anvio/utils.py into multiple modules under the directory anvio/utils/, which include the following modules in it,
__init__.pyalgorithms.pyanviohelp.pycommandline.pydatabase.pydebug.pyfasta.pyfiles.pyhmm.pymisc.pynetwork.pyphylogenetics.pysequences.pystatistics.pysystem.pyvalidation.pyvisualization.py
as defined by the Python file utils_migration_map.py, which also describes which functions are in which modules now. This file is temporarily in the root of the repository, but we will remove it once everything is in master and all major branches are synchronized with this change.
Many small changes in this PR were done manually to set the stage. But the commit 3f5bca57f7008cf4c8a78c26f65714f4548147e0, the one that contains the most comprehensive set of changes that replace all utils imports with proper versions, is produced by the program refactor_utils_changes.py.
Once this PR is merged to master, we will have to go through our major branches, merge master into them, fix conflicts, and once all conflicts are resolved, run the following command to fix the remaining utils imports:
cd ~/github/anvio
python refactor_utils_changes.py anvio
This was a lot of effort, but I think it was really necessary :/
I suggest we wait for the v9 to merge this branch, and do it only after a new release is out.
Self note: It will be EXTREMELY IMPORTANT to manually carry in every change that is made in utils.py after 2025-08-04 into the new utils modules carefully.
The is necessary due to the following issue: after splitting functions in utils.py into their new resting places within modules in the branch utils-py-modularization, we lost our ability to track changes in utils.py in master through any of the git mechanisms to merge changes or to identify and reporting conflicts.
This means, this task must be done right before merging of this branch to master by manually going through changes line-by-line, and carrying in updated functionality into matching module functions.
For instance, the changes Iva made in the utils.py function run_functional_enrichment_stats on 2025-08-08 (i.e., after 2025-08-04) in master, are not in the same function that is now described in utils/statistics.py:
This kind of stuff :)
Sorry for making your life harder @meren :)
Oh no, not at all. Better code > Meren's life.