spaln icon indicating copy to clipboard operation
spaln copied to clipboard

segmentation fault

Open MCH74 opened this issue 4 years ago • 10 comments

Hi, With the following command I get 'segmentation fault (core dumped)' when multi-threading:

spaln -Q4 -O1,2 -t10 -M5 -T -dPH_PH_racon_pilon_pseudohap_masked uniprot_arth_blattodea.faa > spalnout

No such error occurs with -t1.

I am aligning protein sequences against a genome assembly, both of which I prepared beforehand with: spaln -W -KP PH_PH_racon_pilon_pseudohap_masked.gf spaln -W -KA uniprot_arth_blattodea.faa

Any ideas?

Thanks, Mark

MCH74 avatar Feb 25 '20 09:02 MCH74

I wrote that too soon. Also when running on one core, it eventually throws a segmentation fault. Although in each case it produces sensible output before it crashes

MCH74 avatar Feb 25 '20 12:02 MCH74

I have changed my command slightly to: ../spaln -Q7 -O1,2 -t20 -M1 -dPH_PH_racon_pilon_pseudohap_masked uniprot_arth_blattodea.faa > spalnout

but now, after it runs successfully for some time, I am getting the following error:

double free or corruption (!prev) Aborted (core dumped)

Any tips? Is my command incorrect or am I running out of memory? It is a large genome (3G) and large proteome (2G; ~430k sequences).

Thank you for your help, Mark

MCH74 avatar Feb 28 '20 08:02 MCH74

Dear Mark,

Probably, Spaln failed because it encountered an unexpected situation. In general, draft genomic sequences can contain many unusual features not yet considered for counterplan. Without knowing the real example, it is hard to figure out the real cause of the failure. If the genomic sequence (PH_PH_racon_pilon_pseudohap_masked.gf) and aa sequence (uniprot_arth_blattodea.faa) are publicly available, please let me know their addresses.

It is very helpful for me if you can identify the relevant query sequence that caused the segmentation fault. You may be able to narrow down the candidate by running spaln with a limited range of queries:

% spaln –Q7 –d genome –T xxx ‘query.faa (from to)’

where from and to (from <= to) are two positive integers that specifies the range of queries.

I have a few comments on your commands.

  1. Aa sequence is not necessarily formatted in your case.
  2. Unmasked or soft-masked genomic sequence is preferred to hard-masked sequence.
  3. Please set –T xxx option, where xxx stands for the species in ~/table/gnm2tab most closely related to your genome.

Osamu,

ogotoh avatar Mar 04 '20 05:03 ogotoh

Dear Osamu,

Thank you for your tips and comments. I have been running several jobs in parallel on 1000 different segments of the protein file, which works in 9 out of 10 cases. I'm then rerunning on segments of the failed files. I will let you know once I have found the problematic sequences.

Thanks, Mark

MCH74 avatar Mar 04 '20 08:03 MCH74

Dear Osamu,

Thank you for your tips and comments. I have been running several jobs in parallel on 1000 different segments of the protein file, which works in 9 out of 10 cases. I'm then rerunning on segments of the failed files. I will let you know once I have found the problematic sequences.

Thanks, Mark

so?

sashulkaSh avatar Feb 01 '23 06:02 sashulkaSh

Dear Mark,

A considerable number of problems have been fixed during the last few years. Please try the latest version, 2.4.13f to see whether you problem still remains.

Osamu,

ogotoh avatar Feb 01 '23 09:02 ogotoh

hello, Osamu @ogotoh ! I have the same problem with version 2.4.13f right now! Segmentation fault (core dumped) or Aborted (core dumped) with different cds (genomic) data samples (with one of two protein samples, everything is fine - one sample of two is processed without errors)

parameters = -pw -t2 -Q7 -O12 -Tbombmori -yX -yZ1 -yB1 -M4 -S3 -LS I tried a different number of threads (from 1 to 30)...

Thanks, Sasha

sashulkaSh avatar Feb 01 '23 09:02 sashulkaSh

ps: sometimes in log malloc(): memory corruption

sashulkaSh avatar Feb 01 '23 11:02 sashulkaSh

okay, I find that I had to use parameter -yX1 for DNA sequences

sashulkaSh avatar Feb 01 '23 12:02 sashulkaSh

Dear Sasha,

I think -yX1 is the default setting. I recommend not to use -yZ1 and -yB1 options, which are not well tuned for most species. I also recommend not to use -M1 option. Simply omit this option, if you want to obtain at most a single output for each query.

By the way, if you identifie the problematic sequece pairs (genomic and query), please me know the sequences.

Osamu,

ogotoh avatar Feb 03 '23 09:02 ogotoh