NOVOPlasty
NOVOPlasty copied to clipboard
Incorrect File Format
I am receiving the error: THE INPUT READS HAVE AN INCORRECT FILE FORMAT! PLEASE SEND ME THE ID STRUCTURE!
Here is the beginning of the forward and reverse fastq file: FORWARD: @E00368R:309:HLKLYCCXY:6:1101:22130:1678 1:N:0:GTGAAA NATGCTCATTTTTAAGTTCATACTTGTGTTTTGGGTTTCAACTGGAACATGTTTACATGCTTTCATTTTCAAAAAAACCCGCAAATGCTCCGTTTTAGCGCCTGCCTCTTTATGCCCTCCTGATTTATTCTGATCTATGACTAGAACTAC + #AAFFJJJJJJJJJJJJJF-FJFJJJJJJJJJFFJJJJJFFFJFJFJJJJJJJJJJJJFFJJJJJJJJJFAFJJJJJJJ<A<AJJJJJJ<JJJJJJJFFJF<JFFJJJFJJFJ-F<<<<-A7AFJJAFJJJJAF7A-J<A7<7FA----- @E00368R:309:HLKLYCCXY:6:1101:22232:1678 1:N:0:GTGAAA NTGTTGTGAGTAAATGAGACGGTTTATTTAACAGTTTAAACTCTTTTGTGTTGTTATGTAAGAAAATATGTATAGTTCAGAAACCTTTATTGTTCCATGTCAAGTATAAGAGAGTAAAATGATTTTGTTTTGGCGCCTCAACATTTCAGC +
REVERSE: @E00368R:309:HLKLYCCXY:6:1101:22130:1678 2:N:0:GTGAAA AGAAACACTGTATAGAAACTAAAAGAATTCAACCTGTGTACTTTTAGGTCATTATTCTGAATTACAGGAGGCCGAATTTCACCACATTGCAAAATACATAATTTCTCACACAAGTAGGCTGTGCAGTTGGTCATATCCTCATTTTGGATC + AAAFFJJJJJJFJJFJJJJJJJJFJJFJJJJJJJJJJJJ<FJJJJJJJJJJJJFJJJJFJJJJFJAJJFFFJFFFFJJJJFAA<<7AFFJJF<AJFJJAAJJJJJFF<FFFFFJFFFFFA<FFJJFAFFA<FFF7----77-<AF-<-AF @E00368R:309:HLKLYCCXY:6:1101:22232:1678 2:N:0:GTGAAA GATCAAGAGCAGTGGATGTGAGCTGCTCGCCATTCAGGACATTTAGAGCGAATATTGTGGCAGTAAATCGTAAATCCCGCATGATTATCAGAGACCCCAGTCACCCCGACCACAGACGGTTTCGGCTGCTGCCGTGTGGCAAGCGGTATC
Hi,
I will check it out, you are using the latest version (3.7.2)?
Yes, we are using the latest version and it is not working.
I will send a new version by tomorrow
Still does not work.
On Fri, Dec 6, 2019 at 2:42 AM Nicolas Dierckxsens [email protected] wrote:
Hi, can you try with this version, if it works al upload it
NOVOPlasty3.7.3.zip https://github.com/ndierckx/NOVOPlasty/files/3931759/NOVOPlasty3.7.3.zip
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SXIH6QBXGVETKMCXHTQXIUCVA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGDW7FQ#issuecomment-562524054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SVJHZ4533E3WT6RCI3QXIUCVANCNFSM4JVAZROQ .
Hi,
That's weird, it works for me on those read ids.. Can you send me the log of that latest version I send?
I have attached the log.
On Mon, Dec 9, 2019 at 2:47 AM Nicolas Dierckxsens [email protected] wrote:
Hi,
That's weird, it works for me on those read ids.. Can you send me the log of that latest version I send?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SSP65FGLNRXKL52WDTQXYO3NA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGIWJ5Q#issuecomment-563176694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5STZF5NUMQ3F4J7V6CTQXYO3NANCNFSM4JVAZROQ .
NOVOPlasty: The Organelle Assembler Version 3.7.3 Author: Nicolas Dierckxsens, (c) 2015-2019
Input parameters from the configuration file: *** Verify if everything is correct ***
Project:
Project name = Mpammelas Type = mito Genome range = 12000-22000 K-mer = 33 Max memory = 50 Extended log = 0 Save assembled reads = no Seed Input = /data/kelley/ugrads2/keeganp/Circularized_assembly_1_Zamericanus.fasta Reference sequence = Variance detection = Chloroplast sequence =
Dataset 1:
Read Length = 151 Insert size = 300 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /data/kelley/projects/eelpout/Original_fastq/Mpammelas/EP031_R1.fastq.gz Reverse reads = /data/kelley/projects/eelpout/Original_fastq/Mpammelas/EP031_R2.fastq.gz
Heteroplasmy:
Heteroplasmy = HP exclude list = PCR-free = no
Optional:
Insert size auto = yes Insert range = 1.9 Insert range strict = 1.3 Use Quality Scores =
THE INPUT READS HAVE AN INCORRECT FILE FORMAT! PLEASE SEND ME THE ID STRUCTURE!
I don't see what the problem is because config in the log seems fine and works for me. Is it possible to send a small fraction of those files? And are you sure EP031_R1.fastq.gz has the forward ids?
Interestingly, your script works on those reads when they are unzipped but not the gz file. Is there a way to get it to work on the gz file?
On Tue, Dec 10, 2019 at 2:50 AM Nicolas Dierckxsens < [email protected]> wrote:
I don't see what the problem is because config in the log seems fine and works for me. Is it possible to send a small fraction of those files? And are you sure EP031_R1.fastq.gz has the forward ids?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SVJCY4IFXZUF4GYCL3QX5X6ZA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGOZ4QI#issuecomment-563977793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SUOXKEIWCSGWXRTXO3QX5X6ZANCNFSM4JVAZROQ .
gzip should work, maybe something went wrong with the compression of the file. So you could try to unzip and then gzip it again... Or maybe you have a very old version of Perl (you can check with perl -v)
Can you try with the gz files? It does not work for me. I tried with perl/5.28.0 and just a subset of the reads gzipped. It does not work. I have attached the files here.
On Tue, Dec 10, 2019 at 4:01 AM Nicolas Dierckxsens < [email protected]> wrote:
gzip should work, maybe something went wrong with the compression of the file. So you could try to unzip and then gzip it again... Or maybe you have a very old version of Perl (you can check with perl -v)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SR26XKZAF2WUVDVPGLQX6AIRA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGO7YPI#issuecomment-564001853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SRRMXACYEAKAQDWTHTQX6AIRANCNFSM4JVAZROQ .
I can't see the files, think you should send to my private email or on the github chat itself
Hi it works fine for me, so I am not sure what the problem is. I had one user who had a similar problem, he found the problem himself, this was what he send me:
"I was using a bash script to generate the config files and there was a trailing space in the file path. This caused NOVOplasty to not recognise the gz and subsequently failed the fast header checks."
But I doubt this is the same problem. on which platform are you running it and that small dataset should at least work on your laptop?
It was the trailing space!
On Tue, Dec 10, 2019 at 4:58 AM Nicolas Dierckxsens < [email protected]> wrote:
Hi it works fine for me, so I am not sure what the problem is. I had one user who had a similar problem, he found the problem himself, this was what he send me:
"I was using a bash script to generate the config files and there was a trailing space in the file path. This caused NOVOplasty to not recognise the gz and subsequently failed the fast header checks."
But I doubt this is the same problem. on which platform are you running it and that small dataset should at least work on your laptop?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ndierckx/NOVOPlasty/issues/117?email_source=notifications&email_token=AAQA5SW6FBUESAF22K3DID3QX6G6RA5CNFSM4JVAZRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPEOFA#issuecomment-564021012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQA5SVQYNC46G5YI7AXTD3QX6G6RANCNFSM4JVAZROQ .
ok great! maybe will have a look if I can prevent that it causes a problem
Hi @ndierckx, I came across the same issue! Could you have a look of my read format? I've tried various edits but still got the same "Incorrect File Format" complain. I'm using the latest 4.3.1.
@DP8400013689TRL1C001R0030003823:227_1372_1050 TGAATATTTATTATATATGTAAAATTGTGTATATAATTTAATTGTTTTGGCAGATTAGTGCATTAAATTTAGAATTTAAAATTATATATATGAATTAACA + E5@EEEBFEGFFHFGHFFBAEDE><F6DEDEGGFGFFDGEFEFF3ADBGFEDD9ED8=EGF@FFEEFFFGEE;EGCGFC<EEGFFGEFGDEGEDEFFGFE @DP8400013689TRL1C001R0030010239:319_713_1329 TTGCGACCTCGATGTTGAATTAAGATAAAAATTAGGTGTAGAAGTTTAATATTTAAGTCTGTTCGACTTTTAAAATCTTACATGATTTGAGTTTAGATCG + DDG?F>?EEFFDEGEEHBEDEFBCEEFEDF7EDFFDBEDD=@EF;DGFBEFEEFE6GDDFIE;FECFFFFDCDEFDFFEH8FFFFF=HG=FECEEIDEEF @DP8400013689TRL1C001R0030017450:543_1176_1524 AGATAGAAACCGATCTGGCTCACGCCGATCTAAACTCAAATCATGTAAGATTTTAAAAGTCGAACAGACTTAAATATTAAACTTCTACACCTAATTTTTA + FEFGGECFFFHGACIEHFGFGFGFGFECGCGFECFGFHFCGGFFGDFGGDFFGGGFFEGEFG=EGGGEFGFHGEGGGGGFDGHGGHGHFHFHGGGGHEGE @DP8400013689TRL1C001R0030049319:504_579_572 ATAGAAACCGATCTGGCTCACGCCGATCTAAACTCAAATCATGTAAGATTTTAAAAGTCGAACAGACTTAAATATTAAACTTCTACACCTAATTTTTATC + @DEE@B:DE:>EDDEEEDDC@FDF?ADCF?DDDDDEE7D.FDEDD:EFFEEAEBEEADCD;ADDFCDFEFDBEEEDDEEBFFDFEEFDCFFDBEEDFFDB
Thanks!
@dinhe878 Are those illumina reads? How can you link the paired reads, because only one of the 3 pairs having matching IDs? And can you send me the log too?
@dinhe878 Are those illumina reads? How can you link the paired reads, because only one of the 3 pairs having matching IDs? And can you send me the log too?
@ndierckx thanks for your reply. I ended up trying all the illunima reads without subsetting data for mitochondrial genome. Although the assembly was pretty fragmented but it run ok without obvious issue. So my guess for the "Incorrect File Format" complain was due to me subsetting read data using samtools and leads to header issues.
Those reads could still bu run as SE reads but not as PE