tombo icon indicating copy to clipboard operation
tombo copied to clipboard

annotate_raw_with_fastqs issues

Open pc395 opened this issue 3 years ago • 7 comments

I was trying to use the tombo preprocess annotate_raw_with_fastqs option to append the sequencing reads. My command is: % tombo preprocess annotate_raw_with_fastqs --fast5-basedir <path_to_fast5s.fast5> --fastq-filenames <filenames.fastq>

This is the output: [13:36:03] Preparing reads and extracting read identifiers. ****** WARNING ****** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls. 100%|███████████████████████████████████████████████████████████████████████████████████| 262/262 [00:00<00:00, 518.82it/s] [13:36:03] Annotating FAST5s with sequence from FASTQs. Process Process-4: Traceback (most recent call last): File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/site-packages/tombo/_preprocess.py", line 148, in _feed_seq_records_worker fastq_rec = list(islice(fastq_fp, 4)) File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

The program doesn't error out, it just stays in this position forever. Any help would be appreciated. Thanks!

pc395 avatar Apr 15 '21 17:04 pc395

I was able to get past the original issue by unzipping the file but now I get another error: [13:48:58] Preparing reads and extracting read identifiers. ****** WARNING ****** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls. 100%|██████████████████████████████████████████████████████████████████████████████████| 262/262 [00:00<00:00, 1284.45it/s] [13:48:58] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [00:16, ?it/s] [13:49:15] Added sequences to a total of 0 reads.

Any ideas on what went wrong this time? Thanks!

pc395 avatar Apr 15 '21 17:04 pc395

It looks like one of the fastq files might be binary. Could you check that the input fastq flies are valid text files?

marcus1487 avatar Apr 15 '21 17:04 marcus1487

Yup! Just double checked and the fastq file looks ok. I realized that I didn't have the fast5s as single reads so I separated them and tried again and this is what I got: % tombo preprocess annotate_raw_with_fastqs --fast5-basedir singlereads --fastq-filenames file.fastq [14:22:11] Preparing reads and extracting read identifiers. 100%|████████████████████████████████| 1044803/1044803 [31:48<00:00, 547.43it/s] [14:54:10] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [00:18, ?it/s] [14:54:29] Added sequences to a total of 0 reads.

pc395 avatar Apr 15 '21 19:04 pc395

Update: I tried re-basecalling the fast5s and running those and I get the same results: % tombo preprocess annotate_raw_with_fastqs
--fast5-basedir singlereads
--fastq-filenames barcode.fastq \
--overwrite
--basecall-group Basecall_1D_000
--basecall-subgroup BaseCalled_template
--processes 5
[18:35:08] Preparing reads and extracting read identifiers. 100%|████████████████████████████████| 1044803/1044803 [19:20<00:00, 900.23it/s] [18:54:38] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [00:00, ?it/s] [18:54:39] Added sequences to a total of 0 reads.

pc395 avatar Apr 15 '21 23:04 pc395

I get the same error with u,and i have tried many methods from the issues,but get the same error with u.

woshiyangsi avatar Apr 23 '21 01:04 woshiyangsi

tombo preprocess annotate_raw_with_fastqs --fast5-basedir /home/YANGSIYU/data/project/m6A/2021-4-10-data/test2/tombo/A/fast5_pass_single/0 --fastq-filenames pass.fastq --processes 15 --overwrite [09:44:49] Preparing reads and extracting read identifiers. 100%|███████████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:02<00:00, 1804.62it/s] [09:44:51] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [04:50, ?it/s] [09:49:42] Added sequences to a total of 0 reads.

woshiyangsi avatar Apr 23 '21 01:04 woshiyangsi

pass.fastq: @3a4c4f4c-94e1-4a81-b73d-cd680a5007c5 runid=7ef2370e22dedd664bbd1f582b56b8df7d39db4d read=13 ch=811 start_time=2020-11-10T09:52:53 AGCAAGAGAAGCCAAGCACUCGUGGUGGACUCAAGCACCCAGGGUGACCCCUGCUUCGUACCAGCUAGGAAAAUGACCGCCUCUCGAAGGAGAAUGUUUGAAAAUCCCCCCAACCCCAUCAACAACCCAC + ,+%&%))'&))'2/+$(%''-436)(++1%-4=.%%,%%5667:234;6:5-,+)'&54-/.-43;?:5.;)/-266(&&&(%(-+4.476046-+(&)'//)/&%%'331*&%%%&%%'.-$& @681a32d7-be17-4de9-9358-ef3d5d9b2614 runid=7ef2370e22dedd664bbd1f582b56b8df7d39db4d read=12 ch=693 start_time=2020-11-10T09:52:53 ACUGUCUAAAUUUUUUUUUGUCUUAUAGAAAAUUUUUCUAUACUGAUUGGUUUCAUAGAUGAUAUGGUUUUAUACACAGACUAAACAAUACAGCACUUUGCCAAAAAUAAGUGUAGCAUUGCUUAAACAU + %%47/57+(:'843&1)48.%-'-355::;7'4/=9@;7:>?@;972430%)'0469871%&%78:;+0-.-5:6973;679:9-8C@?<32?8A=>7?C<67267;57<@:=23973C;<806- @6c27dd77-c750-4d0b-8ea4-5426eb000a2b runid=7ef2370e22dedd664bbd1f582b56b8df7d39db4d read=20 ch=526 start_time=2020-11-10T09:52:54 CCGUGCCUUUCCAGCUGCUGGUUGAAGCUCUCUGUCAUUUGGCAAUGAUCGGGUCAAAAACAAAAACAAAGCAAAACAGAAUAAAACUUACACAGAAAAAAU + (%&'(&(&%%//.+&&%+*'&**&&/0-'+&2,036/25-,*34456)'.:98,598:>246897-&&+,)/1,)10/87,75,,'%(&&&+13221,//)

woshiyangsi avatar Apr 23 '21 01:04 woshiyangsi