Broccoli
Broccoli copied to clipboard
Python 3.8 global variable not defined
When I attempted to run Broccoli with Python 3.8 I got this error:
Traceback (most recent call last): File "broccoli.py", line 145, in <module> broccoli_step1.step1_kmer_clustering(directory, extension, length_kmer, min_aa, nb_threads) File "/Users/5tl/Downloads/Broccoli-1.2/scripts/broccoli_step1.py", line 55, in step1_kmer_clustering results_2 = tmp_res.get() File "/Users/5tl/anaconda3/envs/broccoli/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value NameError: name 'list_files' is not defined
It appears the below hack was broken between Python 3.6 and 3.8:
# convert the parameters to global variables (horrible hack)
I was able to fix this for the first step by zipping together all of the arguments for the process_file
function:
files_start = zip(list_files, list_start, list(range(len(list_files))), itertools.repeat(directory),
itertools.repeat(length_kmer), itertools.repeat(min_aa), itertools.repeat(out_dir))
converting the multiprocessing call from map_async
to starmap_async
:
tmp_res = pool.starmap_async(process_file, files_start, chunksize=1)
and updating the definition for the process_file
function and variable assignments to reflect the new starmap_async
call:
def process_file(filename, counter, index, directory, length_kmer, min_aa, out_dir):
## get back info
#index = list_files.index(filename)
#counter = list_start[index]
It seems a similar fix is required for each step of the pipeline to make it compatible with the latest version of Python.
Thanks for pointing this issue. I've only used Python 3.6 and wasn't aware of this bug. I will change the readme file to point to Python 3.6 / 3.7 (instead of 3.6+) and will correct the code when I have some time to do it.
I can always submit a pull request with the fixes sometime next week if that would work for you.