pod5-file-format
pod5-file-format copied to clipboard
pod5 merge hangs indefinitely at 99-100%(the last 20 pod5 have not been merged)
Issue Description
I use pod5 merge to merge my pod5 file, I have 3320 pod5 files. It seemed to have stopped processing the last 20 pods. However, nohup told me it was done and there were no errors.
Logs
This is input group.
This is ip group.
Here is my pod5 merge code:
Here is the size of merge_pod5 and multi_pod5:
It seems that the last 20 pod5 have not been merged.
Specifications
- Pod5 Version: 0.3.10
- Python Version: Python 3.8.17
- Platform: Centos7
Interesting. Is this running in a conda environment or python environment? We occasionally see issues when running in conda.
Are you able to merge the remaining 20 files into the ip_merge.pod5 file?
Hi, @HalfPhoton
We occasionally see issues when running in conda.
I run this code in conda environment.
Are you able to merge the remaining 20 files into the ip_merge.pod5 file?
I don't know how pod5 merge handles the order of files. Like test_0.pod5...test_1.pod5... test_20.pod5? If so, I will try to merge it.
Best wishes, Kirito
ah - I see.
In this case please create a list of missing read ids from the first merged output and all inputs using pod5 view.
# get read ids
pod5 view -IH input_data/ -o input.ids
pod5 view -IH merged.pod5 -o merged.ids
# Sort the files (comm requires sorted files)
sort input.ids > input.ids.sorted
sort merged.ids > merged.ids.sorted
# Find ids in input that are not in merged file
comm -23 input.ids.sorted merged.ids.sorted > missing.ids
# Get a pod5 file of only missing ids
pod5 filter input_data/ --ids missing.ids -o missing.pod5
# Merge in missing ids
pod5 merge merged.pod5 missing.pod5 -o merged.final.pod5
I recommend using a python virtual environment instead of a conda environment:
python3.10 -m venv venv --prompt=pod5
source venv/bin/activate
pip install -U pip pod5
pod5 --version
Just for the record, the same thing happens to me, but all the files are actually processed and there's no missing reads. So it's probably something with the progress bar.