strobealign icon indicating copy to clipboard operation
strobealign copied to clipboard

Strobealign as reusable library?

Open ksahlin opened this issue 1 year ago • 2 comments

Adding the request "Please make it a reusable library" from here. Not sure what it entails or mean.

ksahlin avatar Aug 17 '22 08:08 ksahlin

The idea is likely that other people would not only be able to use StrobeAlign as a command-line tool, but that they can also write their own (C++) programs that use the StrobeAlign algorithms. This would be similar to how we rely on external libraries such as args, doctest, kseq++, StripedSmithWaterman etc.

This requires a couple of changes and may sound like extra work, but the great thing is that nearly all of these changes are needed anyway to make the code more readable, more testable and generally more maintainable.

For example, I would like be able to write a unit test like this for aligning a paired-end read against a reference:

    auto refs = References::from_fasta("tests/phix.fasta");
    auto index = StrobemerIndex(refs);
    auto r1 = KSeq("read", "ACGT", "AAAA");
    auto r2 = KSeq("read", "CCAA", "BBBB");

    StrobemerMapper mapper(index);
    auto mapped = mapper.map_paired(r1, r2);
    auto aligned = mapped.align();
    CHECK(aligned[0].ref_position == 270);
    CHECK(aligned[1].is_unmapped);

Many of the changes I’ve made recently are intended to make the above work (and we’re getting there).

So while my motivation for the moment is to enable testing, the required refactoring forces us to come up with a good API that can then also be used by others. Not sure if we’ll go that far, but StrobeAlign could essentially become the frontend to a libstrobealign. The frontend would only parse the command-line and perhaps set up threads or so, and then it delegates index creation and mapping to the backend (API). This would mirror the relationship between samtools and htslib.

Here are a couple of things that would need to be done. (We can turn them into issues when the time comes.)

  • [ ] Never call exit() from API functions, use exceptions instead (as a consumer of the API, you don’t want your own program to exit just because there was a problem in an API call. Perhaps you want to print a nice error message before exiting).
  • [ ] Do not log anything to stdout or stderr from within API functions. (Communicate the information some other way and let the main program decide whether it wants to display that info.)
  • [ ] Hide implementation details from the public headers. (For example, if a typedef is only needed in the cpp file, remove it from the hpp.)
  • [ ] Add documentation.
  • [ ] Ensure const correctness (means essentially: add const to reference parameters wherever possible). This makes it clearer which parameters are modified and which aren’t when calling a function.
  • [ ] Use consistent, meaningful names
  • [ ] Put all of StrobeAlign into a separate namespace (but not the code in main.cpp).
  • [ ] Ensure the CMakeLists.txt files are so that the project can be included via ExternalProject_Add from a different project.

These are just some points that come to mind. All of them (except possibly for the very last one), are things I’d recommend doing anyway, so even if we just aim for StrobeAlign to become a reusable library (but never get there), it would be beneficial because we would get better code in the end.

marcelm avatar Sep 20 '22 13:09 marcelm

Thanks for the detailed answer! Yes, I think this sounds great and worth spending some time on since it is beneficial for the code base regardless.

ksahlin avatar Sep 20 '22 14:09 ksahlin