duplex-tools icon indicating copy to clipboard operation
duplex-tools copied to clipboard

promethion good pairs: 0

Open bef22 opened this issue 2 years ago • 3 comments

Hi I'm using duplex_tools filter_pairs (duplex tools version: 0.3.2) on promethion created fastq files and out of 2759916 duplex pairs none are reported good. I did find the issue where installing into a new virtual environment fixed this issue, however this didn't work for me. I also gunzip all fastq files and still no good pairs are reported. The promethion run was created with Guppy 6.4.6 on R10 flow cell. Any ideas what else I could try? Bettina

filter_pairs_minLen1000_gunzip.log

bef22 avatar Apr 18 '23 14:04 bef22

Hi @bef22! Thanks for the question.

Can you try this again with the additional flag --debug and see if there's a specific reason why reads are skipped? It may be the case that you need to adjust the length settings.

There are four different reasons a read may be skipped, subtly different ones, so would be good to know which one this is.

https://git.oxfordnanolabs.local/research/duplex-tools/-/blob/dev/duplex_tools/filter_pairs.py#L236

If you send the command you used together with a short description of the folder/file structure, it may also help in the next step.

Thanks!

ollenordesjo avatar Apr 18 '23 15:04 ollenordesjo

Thanks for your suggestion. I now have traced the problem which could be a bug or me misunderstanding the options. I was originally running this: duplex_tools filter_pairs --min_length 1000 pair_ids.txt pathTo/fastq_pass and this was giving me no good pairs with "seq1 or seq2 not in requested length range" and I know that I have read pairs which are both >1kb long

I then run as you suggested this: duplex_tools filter_pairs --debug pair_ids.txt pathTo/fastq_pass Which reported Aligning 2759916 pairs and I did get Good pairs: 1045256

So I thought that I might have to specify both --min_length and --max_length and tried: duplex_tools filter_pairs --debug --min_length 1000 --max_length 1000000 pair_ids.txt pathTo/fastq_pass this again failed to give Good pairs

the last few rows of the debug report are: [14:09:30 - AlignPairs] Skipped 0ca6731f-8fa9-4423-b162-a71ccf24aafd: sequence missing. [14:09:30 - AlignPairs] Skipped Pandas(Index=2759909, first='5ef6ee2a-f3bb-4d03-99c0-b175f3c9b1ba', second='9fde27d0-e3fe-436c-b86b-4ecbb127e970'), seq1 or seq2 not in requested length range [14:09:30 - AlignPairs] Skipped Pandas(Index=2759910, first='0fa786fc-8c5e-5001-a123-e447fdf1a275', second='c7866df3-49e6-5da4-b741-229d08705590'), seq1 or seq2 not in requested length range [14:09:30 - AlignPairs] Skipped Pandas(Index=2759911, first='e7f8063a-fd7c-53b7-b51d-bca798b9791d', second='ac9ec84f-856d-5af3-a89f-877a394f6bfd'), seq1 or seq2 not in requested length range [14:09:30 - AlignPairs] Skipped 5eda2b6c-6e63-583d-b673-5e713efc23df: sequence missing. [14:09:30 - AlignPairs] Skipped Pandas(Index=2759913, first='7aca931a-3032-500c-a1c9-adcd10718047', second='ed5f1f35-7f8b-5656-ad1b-d7fa8433bf61'), seq1 or seq2 not in requested length range [14:09:30 - AlignPairs] Skipped Pandas(Index=2759914, first='0deda2e7-b49a-50a7-86aa-dec7b9f0a613', second='f5531828-1b4b-5dc5-8dba-d5ca8b1b0b6a'), seq1 or seq2 not in requested length range [14:09:30 - AlignPairs] Skipped 37738eb2-ca22-5eb3-80fb-455bba5fba29: sequence missing. [14:09:30 - AlignPairs] Good pairs: 0 [14:09:30 - AlignPairs] defaultdict(<class 'int'>, {'skipped': 2759916, 'read1 missing': 56998, 'read0 missing': 179962, 'good': 0})

I don't have to filter by size at this stage so could continue with all good pairs, but I would like to understand if I used the --min_length argument correctly.

Thanks for you help.

Bettina

bef22 avatar Apr 19 '23 13:04 bef22

Hi @bef22, sorry for taking a while to respond. Is there any chance you can print out the length of the sequences (or even the sequences themselves) at this location in the code?

https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/filter_pairs.py#L237

It may be easiest to add another logger.debug(... line for printing this information.

Cheers

ollenordesjo avatar Apr 27 '23 15:04 ollenordesjo