pod5-file-format icon indicating copy to clipboard operation
pod5-file-format copied to clipboard

Discrepancy Between Parsed Read IDs and Calculated Transfers in pod5 filter Command

Open Bhavesh-Tiwarekar opened this issue 1 year ago • 0 comments

Description: I am using the following command to filter data from .pod5 files based on read IDs in a text file:

pod5 filter folder1 folder2 --output filtered.pod5 --ids read_ids.txt

When I run the command, I observe the following output: Parsed 9240344 read_ids from: read_ids.txt Found 25678389 read_ids from 230 inputs Calculated 9215377 transfers

I am trying to understand the cause of the difference in numbers between the parsed read_ids and the calculated transfers. Workflow:

Nanopore Sequencing: I have used nanopore sequencing to generate .pod5 and .fastq files.
Extracted Read IDs: I have extracted the read IDs from the .fastq file.
Filtering: I am using these extracted read IDs to filter data from the .pod5 files using the pod5 filter command.

Specifications

  • Pod5 Version: 0.3.23
  • Python Version: 3.11.5

Bhavesh-Tiwarekar avatar Nov 23 '24 12:11 Bhavesh-Tiwarekar