ANN-SoLo
ANN-SoLo copied to clipboard
Add ELIB/DLIB spectral library support
This pull request adds a module for parsing the ELIB and DLIB spectral libraries, src/ann_solo/sqlite_parsers.py
. These are SQLite3 formats from EncyclopeDIA and are defined here. The PR also changes the logging level to INFO
.
This module should be easy to expand in the future to also parse BLIB libraries from Bibliospec (as requested in #2).
I'm still working on benchmarking, but it seems good so far.
One thing I really envisioned would be useful with this PR is the ability to use Prosit libraries with ANN-SoLo. However, there are a couple of hiccups in doing so:
- The web interface currently requires a CSV file specifying for which it should generate spectra.
- There is currently no way to annotate peptides as decoys in Prosit. Thus, the dlib file that it returns must be annotated after generation.
Would it be out-of-scope for ANN-SoLo to also contain a few utility functions to prepare a FASTA file for Prosit? For (1), I would propose adding a function to generate this CSV file from a FASTA file, similar to the functionality already provided by EncyclopeDIA. To solve (2), I think there are a couple options:
- Add a function that modifies the dlib file to properly indicate decoy peptide spectra.
- Add an optional
decoy_spectral_library_filename
that specifies decoy peptide spectra, implying thatspectral_library_filename
only defines targets.
What are your thoughts? The CSV and annotating a dlib could alternatively be provided by another package.
Yes, I totally agree. Prosit compatibility has been on my wish list / TODO list for quite some time.
My preference would be an end-to-end solution. Rather than having some manual steps in between getting a CSV to submit to the Prosit web interface,and then converting the output from there again, it would be nicer if ANN-SoLo has the option to generate a spectral library (and its index) from a FASTA directly using built-in Prosit.
Prosit is available as open-source, so it should be possible. Although it might complicate installation instructions more, and they're already a bit advanced.
That is a good goal, but yikes that does complicate installation! Do you know they have a programmatic API for their webserver? That might be an alternative way to go if they do.
Either way, I'll probably make a small separate package to handle these things for now.