ganon
ganon copied to clipboard
parallelise parse_seqids
parse_seqids is called multiple times and is a very slow process. on 57000 unique sequence headers it takes on our system more than 30min.
This might be avoidable, if
- the for loop over all input files would parse multiple files simultaneously
- wouldn't be called twice once for sequence names and later for sequence length
You are right on those points, thanks for the suggestions @oliverdrechsel, will try to implement for the next release