spaln icon indicating copy to clipboard operation
spaln copied to clipboard

Segmentation fault in command line parsing

Open tillea opened this issue 3 years ago • 7 comments

Hi, I realised that spaln was moved to github and thus I wanted to release an updated Debian package. We have created a CI test for version 2.4.1, which basically calls:

perl makeidx.pl -v -inp dictdisc_g.gf.gz
spaln -Q7 -d dictdisc_g -T dictdisc dictdisc.faa.gz
spaln -Q7 -d dictdisc_g -yS -T dictdisc -O12 -g dictdisc.cf.gz
sortgrcd -O15 -F2 dictdisc.grd.gz

I realised that I need to apply this patch since it can not assumed that the current directory is inside PATH. (BTW, it would probably be better to not ship the same code twice.)

That way the test is progressing, however the line

spaln -Q7 -d dictdisc_g -yS -T dictdisc -O12 -g dictdisc.cf.gz

segfaults and the actual code line is marked in this debug patch. I guess there is somewhere an issue in the command line parsing code which just crashes here. You can see a full build log to see how spaln was build and here you find the call that leads to segfault.

Kind regards, Andreas.

tillea avatar Jan 23 '22 08:01 tillea

Dear Andreas,

Thank you very much for your reports. I have just uploaded a bug-fix version (ver.2.4.7) to the GitHub site.

Compared with ver.2.4.1, the latest version shows not dramatically but significantly better performance in speed and mappability.

Use of “makeidx.pl -inp input” is somewhat obsolete. Use of “spaln -W -K[A|D|P] input” would be more convenient in most cases.

The meaning of -yS option has slightly changed from the original one; species-specific parameter set will be used whenever -T option is specified. So, you usually do not need to set this option (that is why I could not notice the bug).

Osamu,

ogotoh avatar Jan 28 '22 08:01 ogotoh

Dear Osamu,

Thank you very much for your reports. I have just uploaded a bug-fix version (ver.2.4.7) to the GitHub site.

Ahhh, I just realise now that version 2.4.11 is older than 2.4.2 etc. Our automatic download tool has fetched this one as last release. Now I have tweaked it to consider this 2.4.1.1 (I recommend separating even micro and nano version by a '.' for clarification.)

Compared with ver.2.4.1, the latest version shows not dramatically but significantly better performance in speed and mappability.

Sounds good!

Use of “makeidx.pl -inp input” is somewhat obsolete. Use of “spaln -W -K[A|D|P] input” would be more convenient in most cases.

I do not insist in using makeidx.pl - I simply continued to use the test that was once invented for a certain version. In other words: I'd be super happy if you suggest a better way to test than what we are using in our CI test. Feel free to suggest a total rewrite of what you can read starting at line 18 in this file.

The meaning of -yS option has slightly changed from the original one; species-specific parameter set will be used whenever -T option is specified. So, you usually do not need to set this option (that is why I could not notice the bug).

Thanks a lot for the quick fix.

Unfortunately the test we used to work keeps on failing Any hint how to rewrite that test sensibly would be welcome. Kind regards, Andreas.

tillea avatar Jan 28 '22 10:01 tillea

Dear Andreas,

Change the last line of CItest script from sortgrcd -O15 -F2 dictdisc.grd.gz to sortgrcd -O15 -F2 dictdisc.cf.grd.gz

and then test again.

Osamu,

ogotoh avatar Jan 29 '22 09:01 ogotoh

Dear Osamu

Am Sat, Jan 29, 2022 at 01:35:34AM -0800 schrieb ogotoh:

sortgrcd -O15 -F2 dictdisc.cf.grd.gz

and then test again.

This works and I uploaded the package. However, if you want to suggest a more appropriate test I'd happily use what you suggest.

Kind regards, Andreas.

tillea avatar Jan 30 '22 17:01 tillea

Dear Andreas,

Thank you again for your continued efforts.

I admit that the error message is not much informative. I will change it in the next release.

As mentioned earlier, I prefer to use “spaln -W” rather than to use “makeidx.pl”, because the latter no longer depends on “make” so that it runs in any directory without any care about the location of “Makefile”. One minor disadvantage is that we must run spaln twice if we want to format the genomic sequence for both DNA and protein queries. Thus, the recommended alternative to Line 18 would be:

spaln -W -KD [-t4] dictdisc_g.gf.gz spaln -W -KP [-t4] dictdisc_g.gf.gz

The optional -t4 option would approximately halve the calculation time, if your system supports up to four-fold multi-thread operation. (On my PC with 8 cores, 4 was the best; more than 5 did not further accelerate calculation).

Likewise, -tN option would be useful in the map-and-search mode. In this mode, the acceleration rate is nearly linier with respect to N.

Osamu,

ogotoh avatar Jan 31 '22 03:01 ogotoh

Dear Osamu,

thanks for the hint. I commited the following change

https://salsa.debian.org/med-team/spaln/-/commit/99d300781ad156db581ff7fb4e3a0acd2bd07b52

Kind regards, Andreas.

tillea avatar Jan 31 '22 13:01 tillea

Dear Andreas,

Thanks. I have an additional comment. The -t${NCPUS} option would be more beneficial in the map/align mode rather than in the format mode.

spaln -Q7 -d dictdisc_g -T dictdisc -t${NCPUS} dictdisc.faa.gz spaln -Q7 -d dictdisc_g -T dictdisc -t${NCPUS} -O12 -g dictdisc.cf.gz

Osamu,

ogotoh avatar Feb 01 '22 02:02 ogotoh