spaln
spaln copied to clipboard
segmentation fault
Hi, With the following command I get 'segmentation fault (core dumped)' when multi-threading:
spaln -Q4 -O1,2 -t10 -M5 -T -dPH_PH_racon_pilon_pseudohap_masked uniprot_arth_blattodea.faa > spalnout
No such error occurs with -t1.
I am aligning protein sequences against a genome assembly, both of which I prepared beforehand with: spaln -W -KP PH_PH_racon_pilon_pseudohap_masked.gf spaln -W -KA uniprot_arth_blattodea.faa
Any ideas?
Thanks, Mark
I wrote that too soon. Also when running on one core, it eventually throws a segmentation fault. Although in each case it produces sensible output before it crashes
I have changed my command slightly to: ../spaln -Q7 -O1,2 -t20 -M1 -dPH_PH_racon_pilon_pseudohap_masked uniprot_arth_blattodea.faa > spalnout
but now, after it runs successfully for some time, I am getting the following error:
double free or corruption (!prev) Aborted (core dumped)
Any tips? Is my command incorrect or am I running out of memory? It is a large genome (3G) and large proteome (2G; ~430k sequences).
Thank you for your help, Mark
Dear Mark,
Probably, Spaln failed because it encountered an unexpected situation. In general, draft genomic sequences can contain many unusual features not yet considered for counterplan. Without knowing the real example, it is hard to figure out the real cause of the failure. If the genomic sequence (PH_PH_racon_pilon_pseudohap_masked.gf) and aa sequence (uniprot_arth_blattodea.faa) are publicly available, please let me know their addresses.
It is very helpful for me if you can identify the relevant query sequence that caused the segmentation fault. You may be able to narrow down the candidate by running spaln with a limited range of queries:
% spaln –Q7 –d genome –T xxx ‘query.faa (from to)’
where from and to (from <= to) are two positive integers that specifies the range of queries.
I have a few comments on your commands.
- Aa sequence is not necessarily formatted in your case.
- Unmasked or soft-masked genomic sequence is preferred to hard-masked sequence.
- Please set –T xxx option, where xxx stands for the species in ~/table/gnm2tab most closely related to your genome.
Osamu,
Dear Osamu,
Thank you for your tips and comments. I have been running several jobs in parallel on 1000 different segments of the protein file, which works in 9 out of 10 cases. I'm then rerunning on segments of the failed files. I will let you know once I have found the problematic sequences.
Thanks, Mark
Dear Osamu,
Thank you for your tips and comments. I have been running several jobs in parallel on 1000 different segments of the protein file, which works in 9 out of 10 cases. I'm then rerunning on segments of the failed files. I will let you know once I have found the problematic sequences.
Thanks, Mark
so?
Dear Mark,
A considerable number of problems have been fixed during the last few years. Please try the latest version, 2.4.13f to see whether you problem still remains.
Osamu,
hello, Osamu @ogotoh ! I have the same problem with version 2.4.13f right now!
Segmentation fault (core dumped)
or
Aborted (core dumped)
with different cds (genomic) data samples (with one of two protein samples, everything is fine - one sample of two is processed without errors)
parameters = -pw -t2 -Q7 -O12 -Tbombmori -yX -yZ1 -yB1 -M4 -S3 -LS I tried a different number of threads (from 1 to 30)...
Thanks, Sasha
ps: sometimes in log
malloc(): memory corruption
okay, I find that I had to use parameter -yX1 for DNA sequences
Dear Sasha,
I think -yX1 is the default setting. I recommend not to use -yZ1 and -yB1 options, which are not well tuned for most species. I also recommend not to use -M1 option. Simply omit this option, if you want to obtain at most a single output for each query.
By the way, if you identifie the problematic sequece pairs (genomic and query), please me know the sequences.
Osamu,