tombo icon indicating copy to clipboard operation
tombo copied to clipboard

tombo preprocess annotate_raw_with_fastqs

Open AzlanNI opened this issue 2 years ago • 16 comments

Hello Everyone,

I am currently using Tombo version 1.5 on our uni HPC to analyze some bacterial modifications in DNA. Before we used fast5 data which included the basecalls so i could just start with the resquiggle Step and everything was working fine.

But our updated software separate the fastqs and fast5s. The fastqs are also gziped. So i just ungziped the fastqs and tried to annotate the fast5s with the fastqs from the same barcode (run).

I currently always get the Error: Preparing reads and extracting read identifiers. ****** WARNING ****** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 348/348 [00:39<00:00, 8.85it/s] [19:31:46] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [01:03, ?it/s] [19:32:50] Added sequences to a total of 0 reads.

So i looked this problem up but could not find a solution which was working for me. We are now rebasecalling the Data. But i would like to know if someone knows how this problem could be solved.

The Line i am using is: tombo preprocess annotate_raw_with_fastqs --overwrite --fast5-basedir /fast5s/ --fastq-filenames /fastqs/*.fastq

Is there a problem with having muti or singlefast5s ? Or should i look more into the sequencing settings to solve this problem. And the resquiggle command just gives me the Error that i am missing basecalls in my fast5 data.

I also looked into the final_summary file and i enabled basecalling so i dont understand why Tombo is saying that i am missing basecalls in my fast5.

instrument=MN39041 position= flow_cell_id=FAT59921 sample_id=Mho_4518_PG21 protocol_group_id=Mho_4518_PG21 protocol=sequencing/sequencing_MIN112_DNA_SQK-Q20EA:FLO-MIN112:SQK-NBD112-24 protocol_run_id=0aa23499-3d48-47f8-ac08-4ebd74be0aa5 acquisition_run_id=14dc527603366f0c24d86d62d46496a806336743 started=2023-02-10T16:09:52.073115+01:00 acquisition_stopped=2023-02-13T16:10:51.626632+01:00 processing_stopped=2023-02-13T16:11:32.102936+01:00 basecalling_enabled=1 sequencing_summary_file=sequencing_summary_FAT59921_0aa23499_14dc5276.txt fast5_files_in_final_dest=732 fast5_files_in_fallback=0 fastq_files_in_final_dest=755 fastq_files_in_fallback=0

So if anyone would have an idea how i could solve this problem or if i should provide any further information about my problem let me know.

kind regards,

Azlan

AzlanNI avatar Feb 14 '23 18:02 AzlanNI