ANN-SoLo
ANN-SoLo copied to clipboard
Spectral library searching using approximate nearest neighbor techniques.
Converting blib to splib requires SpectraST. It'd be nice to use blib format, please (BiblioSpec)
- [ ] Enforce a consistent code style using [black](https://github.com/psf/black). - [ ] Provide a git pre-commit hook for automatic code formatting. - [ ] Include contributing guidelines. - [...
Ideally we want to fully install ANN-SoLo using a single `pip install` command, rather than having to [explicitly manage multiple dependencies](https://github.com/bittremieux/ANN-SoLo/wiki/Install). - [ ] Use cross-platform [Faiss wheels](https://github.com/kyamagu/faiss-wheels) as dependency....
Profile runtime to identify bottlenecks and improve speed by optimally parallelizing as many parts of the code as possible. - [ ] Parallelize multiple peak file reading #26. - [...
Running ANN-SoLo can lead to excessive memory requirements: - [ ] The [candidate mask](https://github.com/bittremieux/ANN-SoLo/blob/master/src/ann_solo/spectral_library.py#L405) takes up `O(num_candidates x num_library_spectra)` memory. For a default batch size of 16,384 and a spectral...
Optimally use HPC resources if available. See for example [kNN searching using cuML](https://medium.com/rapids-ai/scaling-knn-to-new-heights-using-rapids-cuml-and-dask-63410983acfe).
Instrument software sometimes miss-assigns a different isotopic peak as the mono-isotopic peak. This can be especially problematic for open searching, as the mass shifts and interpretation will be incorrect. Implement...
Rather than having to explicitly set the precursor m/z tolerance and fragment m/z tolerance in the configuration, determine optimal values automatically using (an efficient reimplementation of) [Param-Medic](http://pubs.acs.org/doi/abs/10.1021/acs.jproteome.7b00028).
Currently spectra are converted to vectors by binning their peaks, after which they are transformed to lower-dimensional vectors. We should explore whether appending complementary peaks or [neutral losses](https://pubs.acs.org/doi/10.1021/jasms.2c00153) to the...
To more efficiently process large datasets, enable searching multiple files simultaneously using a glob file pattern, rather than having to search each peak file individually. When searching multiple files, provide...