acts icon indicating copy to clipboard operation
acts copied to clipboard

Should seed finding tools have a higher-level interface?

Open stephenswat opened this issue 4 years ago • 3 comments

The seed finding interface (see Seedfinder.hpp) is currently defined as a single method, which is given as:

template <template <typename...> typename container_t, typename sp_range_t>
void createSeedsForGroup(
    State& state,
    std::back_insert_iterator<container_t<Seed<external_spacepoint_t>>> outIt,
    sp_range_t bottomSPs, sp_range_t middleSPs, sp_range_t topSPs) const;

To me, this seems like a sub-optimal way to interface with the seed finding code. I say that because this interface seems to follow the binning model, where space points are divided into bottom, middle, and top candidates. As far as I can tell the reason for this is so you can call this method on adjacent bins for performance gains.

However, I feel like this interface makes it harder to implement seeding algorithms that do not use binning. For binning-less algorithms, this sort of forces you to put your data in a mold that doesn't really fit it. On the other hand I do understand why this design is the way it is: it allows us to separate the concerns of the binning code and the seed finding code, and I think that separation is useful to preserve as well.

My goal in creating this issue to start a little bit of a discussion on how the seeding interface should look. On the one hand, it would be cool (I think) to have different seeding implementations unified under a single interface so they can be easily used and tested. On the other hand, like I mentioned, there is value in the separation that exists right now.

So I think it might be a good idea to discuss this a little bit. Here are a few possible ideas on how we can move forward here:

  1. We could choose not to unify seeding interfaces at all. This would be the easiest solution in the short term, but it would make it much harder in the long term to switch out seeding algorithms, and it might also make it harder in the future to develop new seeding algorithms.
  2. We could use the current interface for any future seeding algorithms. This would work, because you can just dump all of your space points in one of the three space point ranges and ignore the rest. However, that is obviously not a "clean" way to use the interface. However, it would minimize the effort required in the current code.
  3. We could add a higher level interface for the seed finder. As far as I am concerned, the minimal sensible API looks something like std::vector<Seed> findSeeds(const std::vector<SpacePoint>) const, possibly with some more abstract templated types. However, the problem with this would be that we could no longer rely on external space point binning. The binning would somehow have to be moved inside of the seeding algorithm. I'm not sure that's something we want to do.

I'm not really sure what would be best here, and perhaps I'm overthinking this. Please feel free to share your opinions on the matter, I would be interested. :smile:

stephenswat avatar Sep 13 '21 20:09 stephenswat

Hi Stephen, this makes total sense to me, let me just add:

The interface was originally "feed in one (middle) space point (and bin finders), get all seeds for this (middle) space point back", but it turned out to be very slow, probably because I passed stuff by value instead of by reference and other such embarrassing performance no-nos. The reason for both the "middle bin" and "middle spacepoint" interfaces was that you can run them in parallel (which was probably the reason I had passed bins by value, as otherwise the bin iterator would be internal state shared between parallel instances).

It would be nice to preserve the ability to run this in parallel, if only to compare it to the other parallel implementations.

But having a different configuration struct which contains the bin finders would at least remove those from the interface.

robertlangenberg avatar Sep 21 '21 16:09 robertlangenberg

This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 22 '21 07:10 stale[bot]

This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.

stale[bot] avatar Nov 25 '21 19:11 stale[bot]