ont_fast5_api
ont_fast5_api copied to clipboard
single_to_multi_fast5 do not collect all the single files if the input folder contains mixed types of fast5 files.
I have a dataset that contains thousands of mixed multiple and single fast5 files in a non-homogenous folder structure. I want to convert all the fast5 files to multi fast5 files.
My solution is to first convert all multi fast5 files to single. The command multi_to_single_fast5 converts only the multi fast5 files to single in a new folder:
orig_path=mixed
save_path=multi
single_path=single
multi_to_single_fast5 -i $orig_path/ -s $single_path/ --recursive
The above command collects all the reads that exist in any multi fast5 files as single fast5 files in $single_path. Then I can convert them all back to multi and make sure I am not missing any read:
single_to_multi_fast5 -i $single_path/ -s $save_path/ --filename_base $output_name --batch_size 1000 --recursive
The above command works fine too. Now I want to use single_to_multi_fast5 command on a folder that contains both multi and single fast5 files ($orig_path) and I expect that it collects all the reads in the single files that exist in $orig_path into muti-files.
single_to_multi_fast5 -i $orig_path/ -s $save_path/ --filename_base $output_name --batch_size 1000 --recursive
But I don't get all the reads from the single fast5 files and some reads are missing in the output folder. This command works fine on the folder that contains only single fast5 files.
Nothing is overwritten and I am testing these steps on a few files in a different folder. Is there a solution to this problem except that I have to check every file to be multi or single? My dataset is super huge, I cannot check if individual files are single and multi. It would take ages.