Improvements self-issue
Open
anuradhawick
opened this issue 4 years ago
•
1 comments
Computations
- Add checkpoint between counting 15mers and generating coverage histogram profiles.
- Combine the steps and put a check for the computed 15mer counts. Load or compute and save. Avoid saving+writing cost
- Use c++
valarray for assignment step (may be compiler will use SSE)
- Add compiler optimization flags -O3 (make last assignment step faster)
- Add evaluation step to the final assigned bins (there is a difference between large enough bins and classifications.txt file) Fix it!
Binner
- Organize different embeddings into different classes
- Provide composition-only and coverage-only options
- Adjust re-sampling to suit different embedding strategies.
- Maybe steal ideas from LRBinner as a pre-step for embedding. VAE+UMAP/SONG works extremely well!
- High-dimensional noise filtering using LRBinner algorithm (should help with noise in ONT reads)
- [x] Add checkpoint between counting 15mers and generating coverage histogram profiles.
- [x] Use c++ valarray for assignment step (may be compiler will use SSE)\
- [X] Add compiler optimization flags -O3 (make last assignment step faster)