GenEra icon indicating copy to clipboard operation
GenEra copied to clipboard

v1.4.0 : no tmp .bout files

Open Proginski opened this issue 9 months ago • 3 comments

Dear genEra developers,

Describe the bug The CDS of A thaliana I am using, won't be dated. I already succeded using genEra v1.4.0 with a subset of H sapiens' CDS Now using the enclosed fasta, even when providing 500Go RAM for 262Go of results, it does not work. Notice that I performed the same analysis (same command) with v1.2.0 and it went perfectly fine (except it took longer of course). I would have bet the problem is caused by the "|" character in the middle of the CDS name, but it worked with the previous version.

To Reproduce Steps to reproduce the behaviour, e.g.

genEra \
-t 3702\
-q CDS/cds_from_genomic.faa \
-b /diamonddb/NR_DB/nr \
-n 75 \
-r ncbi_lineages_2023-07-12.csv

Expected behaviour The ages are not assigned : #gene phylostratum rank taxonomic_representativeness lcl|NC_000932.1_cds_NP_051037.1_48181 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051038.1_48226 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051039.1_48182 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051040.2_48183 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051041.1_48184 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051042.1_48185 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051043.1_48186 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051044.1_48187 Absent from the DIAMOND/MMseqs2 results NA NA lcl|NC_000932.1_cds_NP_051045.1_48188 Absent from the DIAMOND/MMseqs2 results NA NA

Screenshots or code Here are the last lines of the err file (16 Mo of similar 'No such file or directory' lines)

awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_001321941.1_644.bout (No such file or directory) rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_177334.1_10947.bout': No such file or directory awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_565027.1_10948.bout (No such file or directory) rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003071.7_cds_NP_001323584.1_19320.bout': No such file or directory .................................................. 1M .................................................. 2M .................................................. 3M .................................................. 4M ... [mclIO] writing </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci> ....................................... [mclIO] wrote native interchange 48227x48227 matrix with 4144755 entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci> [mclIO] wrote 48227 tab entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.tab> [mcxload] tab has 48227 entries [mclIO] reading </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mcl> ....................................... [mclIO] read native interchange 48227x8569 matrix with 48227 entries

Session info:

Paul

Proginski avatar Sep 26 '23 15:09 Proginski