ORNA icon indicating copy to clipboard operation
ORNA copied to clipboard

EXCEPTION: error opening file: tmp/s2228.fq ()

Open Shellfishgene opened this issue 5 years ago • 8 comments

I tried to run ORNA on a large dataset (each fastq file ~45 GB, paired end), but it fails with the above error. The DSK step seems to work fine, and then ORNA runs for a while and produces various fq files in the tmp directory, s2228.fq is the one with the largest number. It is present and contains normal fastq data as far as I can see. I tried a test with just very small fastq files, ORNA runs fine with these. I tried this on two different machines with the same error, so I don't think it's a file system issue. I guess it happens at the start of the second iteration.

> /opt/ORNA/ORNA -pair1 all.R1.fastq.gz -pair2 all.R2.fastq.gz -sorting 1 -nb-cores 12 -output normalized -type fastq -kmer 25

Given Parameters
----------------
Base:   1.7
kmer size:      25
Mode: ORNA-Q
Number of cores:        12
----------------
[DSK: nb solid kmers found : 507317221   ]  100  %   elapsed:  97 min 17 sec   remaining:   0 min 0  sec   cpu: 247.4 %   mem: [1737, 4411, 4415] MB
[Building BooPHF]  100  %   elapsed:   2 min 8  sec   remaining:   0 min 0  sec
[MPHF: populate                          ]  100  %   elapsed:   3 min 19 sec   remaining:   0 min 0  sec   cpu: 100.0 %   mem: [1285, 1285, 4415] MB
[Bloom: read solid kmers                 ]  100  %   elapsed:   0 min 44 sec   remaining:   0 min 0  sec   cpu: 577.4 %   mem: [2145, 2145, 4415] MB
[Debloom: build extension                ]  100  %   elapsed:   1 min 46 sec   remaining:   0 min 0  sec   cpu: 896.9 %   mem: [2493, 2493, 4415] MB
[Debloom: finalization                   ]  100  %   elapsed:   0 min 56 sec   remaining:   0 min 0  sec   cpu: 154.9 %   mem: [2372, 2372, 4415] MB
[Debloom: cascading                      ]  100  %   elapsed:   0 min 32 sec   remaining:   0 min 0  sec   cpu: 878.4 %   mem: [2373, 2373, 4415] MB
[Graph: nb branching found : 46454780    ]  100  %   elapsed:   1 min 46 sec   remaining:   0 min 0  sec   cpu: 2112.5 %   mem: [3904, 3904, 4415] MB
Populating node abundances
Running ORNA-Q in paired end mode
1841
EXCEPTION: error opening file: tmp/s2228.fq ()

Shellfishgene avatar Aug 20 '19 15:08 Shellfishgene

Hi, Can you point me to the data. I will try it out and figure where the problem lies. I apologise for the inconvinience

ddurai avatar Aug 21 '19 08:08 ddurai

Hello, Did you find the cause of this problem? Half of my samples are running into the same error.

Thank you!

clockdive avatar Sep 24 '20 07:09 clockdive

No, I don't think I looked at it further, sorry.

Shellfishgene avatar Sep 24 '20 09:09 Shellfishgene

Hey, Has this issue been resolved? I had this error occur in 1 of 30 samples with no disernable reason. Here is the command I used:

~/bin/ORNA/build/bin/ORNA -pair1 ${fwd} -pair2 ${rev} -nb-cores 8 -type fastq -sorting 1 -output orna/${sag_id}

Here is the last few line from the commandline:

[Graph: nb branching found : 5235816     ]  100  %   elapsed:   0 min 7  sec   remaining:   0 min 0  sec   cpu: 1654.9 %   mem: [ 795,  795, 4874] MB 
Populating node abundances
Running ORNA-Q in paired end mode
1048
EXCEPTION: error opening file: orna/tmp/s2046.fq ()

My inputs are a few Gbs, but I would be happy to try to share them with you if you need. Thank you!

RyloByte avatar Jan 19 '21 22:01 RyloByte

Hi, we are still not sure what the problem is. Is there anything particular with the one sample were you get the error McGlock?

Kind regards, Marcel

SchulzLab avatar Jan 25 '21 08:01 SchulzLab

Hey Marcel, Thanks for the reply! It's very strange because I'm essentially making subsets of E.coli genomes and then running them through a workflow that includes ORNA. So the subset genomes are from the same source, and some pass and some fail. With no real identifiable difference between the ones that do or don't fail. I could try to build a minimal example to provide to you if you'd like.

RyloByte avatar Jan 25 '21 22:01 RyloByte

Hi McGlock,

I am not able to recreate the error. If you don't mind, can you provide me with the link to the data.

Kind regards Dilip A Durai

ddurai avatar Jan 26 '21 20:01 ddurai

Hey Dilip A Durai Here is a link to the minimal input I could get to fail link Here is a excerpt of the output:

Populating node abundances
Running ORNA-Q in paired end mode
1093
EXCEPTION: error opening file: SI_QC_SAGs/tmp/s1456.fq ()

Also here is the command I used:

~/bin/ORNA/build/bin/ORNA -pair1 test.R1.fastq -pair2 test.R2.fastq -nb-cores 8 -type fastq -sorting 1 -output test_output

One more edit to this comment: I ran the same command without -sorting 1 and it succeeded. I think found that by removing that flag, the error seems to have disappeared, so my guess is that the issue is within that particular feature. For now I'll just use the original ORNA without sorting.

Please let me know if I can' assist in any other way. Thank you!

RyloByte avatar Jan 29 '21 03:01 RyloByte