Make sequence more abstract, so it can be anything, not just array of chars.
Multiple people where asking about support for multybyte characters (unicode). One way to provide that and even more is by making a sequence not an array of chars, but instead an array of objects that satisfy the condition that they have equality operator defined over them.
What would the impact on speed be in this case? I think it would not be big impact, since they are anyway used only to calculate Peq and after that Peq is used.
Would it make it harder to use edlib for usual cases? Would it become to general, hard to use for strings? How could we make sure it is still easy to use while offering flexilibity?
Finally, this might be easier to implement if I decide before that to go with just C++ interface, so I should think about that first.
So far there have been 3 issues asking for multibyte support, so I assigned important label to this feature as it seems to be important to users.
With @jbaiter 's addition to Python version of Edlib this issue is less pressing, but still, it should be the next one to do.
This is also linked to this: https://github.com/Martinsos/edlib/issues/141 (Unicode support in python edlib).
@masri2019 has been working on this for some time now with a little bit of my guidance, so I will document here what has been done and what is yet to be done to call this feature complete!
- [x] Replacing edlib.h and edlib.cpp with edlib.hpp and edlib.tpp. Additional equalities does not work yet, tests and aligner are not updated, and C interface does not exist anymore. Python binding is also no updated. DONE WITH #148 .
- [x] Updating tests and aligner. DONE WITH #150 .
- [x] Update additional equalities to work again. DONE WITH #154 . CPP codebase is now fully working. Performance seems to keep up.
- [ ] Update documentation for C/CPP (README.md).
- [ ] Update python binding so it works with new CPP implementation, and also update its documentation. Check https://www.benjack.io/2018/02/02/python-cpp-revisited.html, might be helpful.
- [ ] Consider if we should add C wrappers or not. If yes, we add cedlib.h and cedlib.cpp files, which will be small and short. Possibly we add a test or two, since it is merely using the CPP interface and most of the stuff is checked in compile time. We don't do performance testing, since it is not needed. We also add docs for it. If we don't do this now, we can always do it later. Check #80 also, I was pondering more about this there.
- [ ] Do final polishing, check that CI is passing, possibly run some final performance checks, and release new version (both cpp/c and python), with version bumped to 2.0.0 due to the new interface.
We are using "big" feature branch gen-seqs where we are collecting these changes, and will merge them back into master once it is done.
Additional ideas/considerations:
- [ ] Document requirements on templates (Element has to have == operator)?
- [ ] Make sure CMAKE is in good shape (best to do this when rebasing on master?).
- [ ] In some internal functions, consider using name Element instead of AlphabetIdx, if they don't need to know it is AlphabetIdx.
- [ ] Consider making API more C++ish (vectors, strings, ...). We could overload edlibAlign method to take different types of parameters (cpp string, vector, ...). We could also make returned types and other structures that we use nicer. This all should probably be tackled as a separate issue due to the amount of changes.
- [ ] Ensure there are no 'using namespace' in header files.
- [ ] Try including edlib.hpp two times, from two different files, and see if we get double definitions error! Make sure we have automatic test for this and that it is run by CI. @masri2019 already implement the test (https://github.com/Martinsos/edlib/tree/gen-seqs/test/testMultiDefinition), however I am not sure how to best make it part of CI, that is what needs further consideration.
Hey @masri2019, how are you doing? We made great progress with this one and then stopped -> are you still interested in possibly continuing with it, how are you with time?
Hi Martin!
Thanks for asking. Yes I'm definitely interested in finishing what we have started. I have been busy doing some other projects but I can plan to dedicate some time to edlib. Based on what you sent, the next step is updating the readme. I'll create a pull request for that.
-Mobin
On Tue, Aug 31, 2021 at 11:31 AM Martin Šošić @.***> wrote:
Hey @masri2019 https://github.com/masri2019, how are you doing? We made great progress with this one and then stopped -> are you still interested in possibly continuing with it, how are you with time?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Martinsos/edlib/issues/90#issuecomment-909495121, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANLIBF55QRLUOAKGYESXSXLT7UNZ5ANCNFSM4DXXI44A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@masri2019 that is awesome :)!! I will also do my best to help you, I believe the two us can finish it together, if needed I can involve myself more, I should also be able to carve out some time.
Yes, the next step is README based on the checklist I created above (which I am now really happy I made because I would have no idea where we stopped otherwise :D). And then python bindings. I am sure we can get both of those done.
Next will be discussion about C wrapper, that might be a bit harder, but ok that is also doable. And then final polishing!
All together sounds like we (you) did the hardest part already, so really looking forward to this. Although, you know how they say: last 20% takes 80% of the time. But let's hope in this case percentages will be gentle to us.
@masri2019 I am guessing it might be a bit hard getting back into it after so much time, so I would advise you do what you can and if you get stuck somewhere no worries, make a draft PR and I can jump in, we will figure it out together. I also forget a lot of things but I am sure we will remember it relatively quickly, since we were writing pretty nice code.