NOVOPlasty icon indicating copy to clipboard operation
NOVOPlasty copied to clipboard

Incorrect File Format

Open jokelley opened this issue 5 years ago • 20 comments

I am receiving the error: THE INPUT READS HAVE AN INCORRECT FILE FORMAT! PLEASE SEND ME THE ID STRUCTURE!

Here is the beginning of the forward and reverse fastq file: FORWARD: @E00368R:309:HLKLYCCXY:6:1101:22130:1678 1:N:0:GTGAAA NATGCTCATTTTTAAGTTCATACTTGTGTTTTGGGTTTCAACTGGAACATGTTTACATGCTTTCATTTTCAAAAAAACCCGCAAATGCTCCGTTTTAGCGCCTGCCTCTTTATGCCCTCCTGATTTATTCTGATCTATGACTAGAACTAC + #AAFFJJJJJJJJJJJJJF-FJFJJJJJJJJJFFJJJJJFFFJFJFJJJJJJJJJJJJFFJJJJJJJJJFAFJJJJJJJ<A<AJJJJJJ<JJJJJJJFFJF<JFFJJJFJJFJ-F<<<<-A7AFJJAFJJJJAF7A-J<A7<7FA----- @E00368R:309:HLKLYCCXY:6:1101:22232:1678 1:N:0:GTGAAA NTGTTGTGAGTAAATGAGACGGTTTATTTAACAGTTTAAACTCTTTTGTGTTGTTATGTAAGAAAATATGTATAGTTCAGAAACCTTTATTGTTCCATGTCAAGTATAAGAGAGTAAAATGATTTTGTTTTGGCGCCTCAACATTTCAGC +

REVERSE: @E00368R:309:HLKLYCCXY:6:1101:22130:1678 2:N:0:GTGAAA AGAAACACTGTATAGAAACTAAAAGAATTCAACCTGTGTACTTTTAGGTCATTATTCTGAATTACAGGAGGCCGAATTTCACCACATTGCAAAATACATAATTTCTCACACAAGTAGGCTGTGCAGTTGGTCATATCCTCATTTTGGATC + AAAFFJJJJJJFJJFJJJJJJJJFJJFJJJJJJJJJJJJ<FJJJJJJJJJJJJFJJJJFJJJJFJAJJFFFJFFFFJJJJFAA<<7AFFJJF<AJFJJAAJJJJJFF<FFFFFJFFFFFA<FFJJFAFFA<FFF7----77-<AF-<-AF @E00368R:309:HLKLYCCXY:6:1101:22232:1678 2:N:0:GTGAAA GATCAAGAGCAGTGGATGTGAGCTGCTCGCCATTCAGGACATTTAGAGCGAATATTGTGGCAGTAAATCGTAAATCCCGCATGATTATCAGAGACCCCAGTCACCCCGACCACAGACGGTTTCGGCTGCTGCCGTGTGGCAAGCGGTATC

jokelley avatar Dec 04 '19 00:12 jokelley

Hi,

I will check it out, you are using the latest version (3.7.2)?

ndierckx avatar Dec 04 '19 11:12 ndierckx

Yes, we are using the latest version and it is not working.

jokelley avatar Dec 04 '19 16:12 jokelley

I will send a new version by tomorrow

ndierckx avatar Dec 05 '19 14:12 ndierckx

Hi, can you try with this version, if it works al upload it

NOVOPlasty3.7.3.zip

ndierckx avatar Dec 06 '19 10:12 ndierckx

Still does not work.

On Fri, Dec 6, 2019 at 2:42 AM Nicolas Dierckxsens [email protected] wrote:

Hi, can you try with this version, if it works al upload it

NOVOPlasty3.7.3.zip https://github.com/ndierckx/NOVOPlasty/files/3931759/NOVOPlasty3.7.3.zip

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SXIH6QBXGVETKMCXHTQXIUCVA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGDW7FQ#issuecomment-562524054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SVJHZ4533E3WT6RCI3QXIUCVANCNFSM4JVAZROQ .

jokelley avatar Dec 06 '19 17:12 jokelley

Hi,

That's weird, it works for me on those read ids.. Can you send me the log of that latest version I send?

ndierckx avatar Dec 09 '19 10:12 ndierckx

I have attached the log.

On Mon, Dec 9, 2019 at 2:47 AM Nicolas Dierckxsens [email protected] wrote:

Hi,

That's weird, it works for me on those read ids.. Can you send me the log of that latest version I send?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SSP65FGLNRXKL52WDTQXYO3NA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGIWJ5Q#issuecomment-563176694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5STZF5NUMQ3F4J7V6CTQXYO3NANCNFSM4JVAZROQ .


NOVOPlasty: The Organelle Assembler Version 3.7.3 Author: Nicolas Dierckxsens, (c) 2015-2019

Input parameters from the configuration file: *** Verify if everything is correct ***

Project:

Project name = Mpammelas Type = mito Genome range = 12000-22000 K-mer = 33 Max memory = 50 Extended log = 0 Save assembled reads = no Seed Input = /data/kelley/ugrads2/keeganp/Circularized_assembly_1_Zamericanus.fasta Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 151 Insert size = 300 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /data/kelley/projects/eelpout/Original_fastq/Mpammelas/EP031_R1.fastq.gz Reverse reads = /data/kelley/projects/eelpout/Original_fastq/Mpammelas/EP031_R2.fastq.gz

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free = no

Optional:

Insert size auto = yes Insert range = 1.9 Insert range strict = 1.3 Use Quality Scores =

THE INPUT READS HAVE AN INCORRECT FILE FORMAT! PLEASE SEND ME THE ID STRUCTURE!

jokelley avatar Dec 09 '19 23:12 jokelley

I don't see what the problem is because config in the log seems fine and works for me. Is it possible to send a small fraction of those files? And are you sure EP031_R1.fastq.gz has the forward ids?

ndierckx avatar Dec 10 '19 10:12 ndierckx

Interestingly, your script works on those reads when they are unzipped but not the gz file. Is there a way to get it to work on the gz file?

On Tue, Dec 10, 2019 at 2:50 AM Nicolas Dierckxsens < [email protected]> wrote:

I don't see what the problem is because config in the log seems fine and works for me. Is it possible to send a small fraction of those files? And are you sure EP031_R1.fastq.gz has the forward ids?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SVJCY4IFXZUF4GYCL3QX5X6ZA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGOZ4QI#issuecomment-563977793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SUOXKEIWCSGWXRTXO3QX5X6ZANCNFSM4JVAZROQ .

jokelley avatar Dec 10 '19 11:12 jokelley

gzip should work, maybe something went wrong with the compression of the file. So you could try to unzip and then gzip it again... Or maybe you have a very old version of Perl (you can check with perl -v)

ndierckx avatar Dec 10 '19 12:12 ndierckx

Can you try with the gz files? It does not work for me. I tried with perl/5.28.0 and just a subset of the reads gzipped. It does not work. I have attached the files here.

On Tue, Dec 10, 2019 at 4:01 AM Nicolas Dierckxsens < [email protected]> wrote:

gzip should work, maybe something went wrong with the compression of the file. So you could try to unzip and then gzip it again... Or maybe you have a very old version of Perl (you can check with perl -v)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SR26XKZAF2WUVDVPGLQX6AIRA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGO7YPI#issuecomment-564001853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SRRMXACYEAKAQDWTHTQX6AIRANCNFSM4JVAZROQ .

jokelley avatar Dec 10 '19 12:12 jokelley

I can't see the files, think you should send to my private email or on the github chat itself

ndierckx avatar Dec 10 '19 12:12 ndierckx

Hi it works fine for me, so I am not sure what the problem is. I had one user who had a similar problem, he found the problem himself, this was what he send me:

"I was using a bash script to generate the config files and there was a trailing space in the file path. This caused NOVOplasty to not recognise the gz and subsequently failed the fast header checks."

But I doubt this is the same problem. on which platform are you running it and that small dataset should at least work on your laptop?

ndierckx avatar Dec 10 '19 12:12 ndierckx

It was the trailing space!

On Tue, Dec 10, 2019 at 4:58 AM Nicolas Dierckxsens < [email protected]> wrote:

Hi it works fine for me, so I am not sure what the problem is. I had one user who had a similar problem, he found the problem himself, this was what he send me:

"I was using a bash script to generate the config files and there was a trailing space in the file path. This caused NOVOplasty to not recognise the gz and subsequently failed the fast header checks."

But I doubt this is the same problem. on which platform are you running it and that small dataset should at least work on your laptop?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SW6FBUESAF22K3DID3QX6G6RA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPEOFA#issuecomment-564021012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SVQYNC46G5YI7AXTD3QX6G6RANCNFSM4JVAZROQ .

jokelley avatar Dec 10 '19 21:12 jokelley

ok great! maybe will have a look if I can prevent that it causes a problem

ndierckx avatar Dec 10 '19 23:12 ndierckx

Hi @ndierckx, I came across the same issue! Could you have a look of my read format? I've tried various edits but still got the same "Incorrect File Format" complain. I'm using the latest 4.3.1.

@DP8400013689TRL1C001R0030003823:227_1372_1050 TGAATATTTATTATATATGTAAAATTGTGTATATAATTTAATTGTTTTGGCAGATTAGTGCATTAAATTTAGAATTTAAAATTATATATATGAATTAACA + E5@EEEBFEGFFHFGHFFBAEDE><F6DEDEGGFGFFDGEFEFF3ADBGFEDD9ED8=EGF@FFEEFFFGEE;EGCGFC<EEGFFGEFGDEGEDEFFGFE @DP8400013689TRL1C001R0030010239:319_713_1329 TTGCGACCTCGATGTTGAATTAAGATAAAAATTAGGTGTAGAAGTTTAATATTTAAGTCTGTTCGACTTTTAAAATCTTACATGATTTGAGTTTAGATCG + DDG?F>?EEFFDEGEEHBEDEFBCEEFEDF7EDFFDBEDD=@EF;DGFBEFEEFE6GDDFIE;FECFFFFDCDEFDFFEH8FFFFF=HG=FECEEIDEEF @DP8400013689TRL1C001R0030017450:543_1176_1524 AGATAGAAACCGATCTGGCTCACGCCGATCTAAACTCAAATCATGTAAGATTTTAAAAGTCGAACAGACTTAAATATTAAACTTCTACACCTAATTTTTA + FEFGGECFFFHGACIEHFGFGFGFGFECGCGFECFGFHFCGGFFGDFGGDFFGGGFFEGEFG=EGGGEFGFHGEGGGGGFDGHGGHGHFHFHGGGGHEGE @DP8400013689TRL1C001R0030049319:504_579_572 ATAGAAACCGATCTGGCTCACGCCGATCTAAACTCAAATCATGTAAGATTTTAAAAGTCGAACAGACTTAAATATTAAACTTCTACACCTAATTTTTATC + @DEE@B:DE:>EDDEEEDDC@FDF?ADCF?DDDDDEE7D.FDEDD:EFFEEAEBEEADCD;ADDFCDFEFDBEEEDDEEBFFDFEEFDCFFDBEEDFFDB

Thanks!

dinhe878 avatar Aug 04 '21 08:08 dinhe878

@dinhe878 Are those illumina reads? How can you link the paired reads, because only one of the 3 pairs having matching IDs? And can you send me the log too?

ndierckx avatar Aug 10 '21 15:08 ndierckx

@dinhe878 Are those illumina reads? How can you link the paired reads, because only one of the 3 pairs having matching IDs? And can you send me the log too?

@ndierckx thanks for your reply. I ended up trying all the illunima reads without subsetting data for mitochondrial genome. Although the assembly was pretty fragmented but it run ok without obvious issue. So my guess for the "Incorrect File Format" complain was due to me subsetting read data using samtools and leads to header issues.

dinhe878 avatar Aug 16 '21 09:08 dinhe878

Those reads could still bu run as SE reads but not as PE

ndierckx avatar Aug 16 '21 12:08 ndierckx