tombo
tombo copied to clipboard
annotate_raw_with_fastqs issues
I was trying to use the tombo preprocess annotate_raw_with_fastqs option to append the sequencing reads. My command is: % tombo preprocess annotate_raw_with_fastqs --fast5-basedir <path_to_fast5s.fast5> --fastq-filenames <filenames.fastq>
This is the output: [13:36:03] Preparing reads and extracting read identifiers. ****** WARNING ****** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls. 100%|███████████████████████████████████████████████████████████████████████████████████| 262/262 [00:00<00:00, 518.82it/s] [13:36:03] Annotating FAST5s with sequence from FASTQs. Process Process-4: Traceback (most recent call last): File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/site-packages/tombo/_preprocess.py", line 148, in _feed_seq_records_worker fastq_rec = list(islice(fastq_fp, 4)) File "/Users/poonamchitale/opt/miniconda3/envs/tombo_test/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
The program doesn't error out, it just stays in this position forever. Any help would be appreciated. Thanks!
I was able to get past the original issue by unzipping the file but now I get another error: [13:48:58] Preparing reads and extracting read identifiers. ****** WARNING ****** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls. 100%|██████████████████████████████████████████████████████████████████████████████████| 262/262 [00:00<00:00, 1284.45it/s] [13:48:58] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [00:16, ?it/s] [13:49:15] Added sequences to a total of 0 reads.
Any ideas on what went wrong this time? Thanks!
It looks like one of the fastq files might be binary. Could you check that the input fastq flies are valid text files?
Yup! Just double checked and the fastq file looks ok. I realized that I didn't have the fast5s as single reads so I separated them and tried again and this is what I got: % tombo preprocess annotate_raw_with_fastqs --fast5-basedir singlereads --fastq-filenames file.fastq [14:22:11] Preparing reads and extracting read identifiers. 100%|████████████████████████████████| 1044803/1044803 [31:48<00:00, 547.43it/s] [14:54:10] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [00:18, ?it/s] [14:54:29] Added sequences to a total of 0 reads.
Update: I tried re-basecalling the fast5s and running those and I get the same results:
% tombo preprocess annotate_raw_with_fastqs
--fast5-basedir singlereads
--fastq-filenames barcode.fastq \
--overwrite
--basecall-group Basecall_1D_000
--basecall-subgroup BaseCalled_template
--processes 5
[18:35:08] Preparing reads and extracting read identifiers.
100%|████████████████████████████████| 1044803/1044803 [19:20<00:00, 900.23it/s]
[18:54:38] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [00:00, ?it/s]
[18:54:39] Added sequences to a total of 0 reads.
I get the same error with u,and i have tried many methods from the issues,but get the same error with u.
tombo preprocess annotate_raw_with_fastqs --fast5-basedir /home/YANGSIYU/data/project/m6A/2021-4-10-data/test2/tombo/A/fast5_pass_single/0 --fastq-filenames pass.fastq --processes 15 --overwrite [09:44:49] Preparing reads and extracting read identifiers. 100%|███████████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:02<00:00, 1804.62it/s] [09:44:51] Annotating FAST5s with sequence from FASTQs. ****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files. 0it [04:50, ?it/s] [09:49:42] Added sequences to a total of 0 reads.
pass.fastq: @3a4c4f4c-94e1-4a81-b73d-cd680a5007c5 runid=7ef2370e22dedd664bbd1f582b56b8df7d39db4d read=13 ch=811 start_time=2020-11-10T09:52:53 AGCAAGAGAAGCCAAGCACUCGUGGUGGACUCAAGCACCCAGGGUGACCCCUGCUUCGUACCAGCUAGGAAAAUGACCGCCUCUCGAAGGAGAAUGUUUGAAAAUCCCCCCAACCCCAUCAACAACCCAC + ,+%&%))'&))'2/+$(%''-436)(++1%-4=.%%,%%5667:234;6:5-,+)'&54-/.-43;?:5.;)/-266(&&&(%(-+4.476046-+(&)'//)/&%%'331*&%%%&%%'.-$& @681a32d7-be17-4de9-9358-ef3d5d9b2614 runid=7ef2370e22dedd664bbd1f582b56b8df7d39db4d read=12 ch=693 start_time=2020-11-10T09:52:53 ACUGUCUAAAUUUUUUUUUGUCUUAUAGAAAAUUUUUCUAUACUGAUUGGUUUCAUAGAUGAUAUGGUUUUAUACACAGACUAAACAAUACAGCACUUUGCCAAAAAUAAGUGUAGCAUUGCUUAAACAU + %%47/57+(:'843&1)48.%-'-355::;7'4/=9@;7:>?@;972430%)'0469871%&%78:;+0-.-5:6973;679:9-8C@?<32?8A=>7?C<67267;57<@:=23973C;<806- @6c27dd77-c750-4d0b-8ea4-5426eb000a2b runid=7ef2370e22dedd664bbd1f582b56b8df7d39db4d read=20 ch=526 start_time=2020-11-10T09:52:54 CCGUGCCUUUCCAGCUGCUGGUUGAAGCUCUCUGUCAUUUGGCAAUGAUCGGGUCAAAAACAAAAACAAAGCAAAACAGAAUAAAACUUACACAGAAAAAAU + (%&'(&(&%%//.+&&%+*'&**&&/0-'+&2,036/25-,*34456)'.:98,598:>246897-&&+,)/1,)10/87,75,,'%(&&&+13221,//)