pod5-file-format icon indicating copy to clipboard operation
pod5-file-format copied to clipboard

pod5 merge hangs indefinitely at 99-100%(the last 20 pod5 have not been merged)

Open kir1to455 opened this issue 1 year ago • 5 comments

Issue Description

I use pod5 merge to merge my pod5 file, I have 3320 pod5 files. It seemed to have stopped processing the last 20 pods. However, nohup told me it was done and there were no errors.

Logs

This is input group. image This is ip group. image image Here is my pod5 merge code: image Here is the size of merge_pod5 and multi_pod5: image image It seems that the last 20 pod5 have not been merged.

Specifications

  • Pod5 Version: 0.3.10
  • Python Version: Python 3.8.17
  • Platform: Centos7

kir1to455 avatar Jun 09 '24 13:06 kir1to455

Interesting. Is this running in a conda environment or python environment? We occasionally see issues when running in conda.

Are you able to merge the remaining 20 files into the ip_merge.pod5 file?

HalfPhoton avatar Jun 25 '24 10:06 HalfPhoton

Hi, @HalfPhoton

We occasionally see issues when running in conda.

I run this code in conda environment. image

Are you able to merge the remaining 20 files into the ip_merge.pod5 file?

I don't know how pod5 merge handles the order of files. Like test_0.pod5...test_1.pod5... test_20.pod5? If so, I will try to merge it.

Best wishes, Kirito

kir1to455 avatar Jun 25 '24 11:06 kir1to455

ah - I see.

In this case please create a list of missing read ids from the first merged output and all inputs using pod5 view.

# get read ids
pod5 view -IH input_data/ -o input.ids
pod5 view -IH merged.pod5 -o merged.ids

# Sort the files (comm requires sorted files)
sort input.ids > input.ids.sorted
sort merged.ids > merged.ids.sorted

# Find ids in input that are not in merged file
comm -23 input.ids.sorted merged.ids.sorted > missing.ids

# Get a pod5 file of only missing ids
pod5 filter input_data/ --ids missing.ids -o missing.pod5

# Merge in missing ids
pod5 merge merged.pod5 missing.pod5 -o merged.final.pod5

HalfPhoton avatar Jun 25 '24 12:06 HalfPhoton

I recommend using a python virtual environment instead of a conda environment:

python3.10 -m venv venv --prompt=pod5
source venv/bin/activate
pip install -U pip pod5
pod5 --version

HalfPhoton avatar Jun 25 '24 12:06 HalfPhoton

Just for the record, the same thing happens to me, but all the files are actually processed and there's no missing reads. So it's probably something with the progress bar.

arturotorreso avatar Oct 21 '24 22:10 arturotorreso