MetaboliteSpectralMatcher fixes from playing with Nucleosides
Description
MetaboliteSpectralMatching has some issues parsing non-massbank libraries. Add some debugging code and update our MGF parser so that we can read data from GNPS. Also fix Deb requirements
Checklist
- [ ] Make sure that you are listed in the AUTHORS file
- [ ] Add relevant changes and new features to the CHANGELOG file
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] New and existing unit tests pass locally with my changes
- [ ] Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)
How can I get additional information on failed tests during CI
Click to expand
If your PR is failing you can check out- The details of the action statuses at the end of the PR or the "Checks" tab.
- http://cdash.seqan.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number. If you click in the column that lists the failed tests you will get detailed error messages.
Advanced commands (admins / reviewer only)
Click to expand
/reformat(experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.- setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
- commenting with
rebuild jenkinswill retrigger Jenkins-based CI builds
:warning: Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).
Summary by CodeRabbit
-
New Features
- Support reading COMPOUND_NAME from MGF and mapping it to the metabolite name.
- Improved identifier selection and display: prioritize GNPS Spectrum ID, then MassBank Accession, then metabolite name, with sensible fallbacks.
- More informative debug output showing candidate metadata and derived metabolite names during matching.
-
Refactor
- Replaced console prints with structured debug logging and updated result reporting to use derived names/IDs.
-
Tests
- Added tests validating COMPOUND_NAME-to-metabolite-name mapping from MGF.
Walkthrough
Adds parsing of COMPOUND_NAME in MascotGenericFile to set MSM_METABOLITE_NAME; updates MetaboliteSpectralMatching to adjust identifier preference and logging/metadata handling; extends unit tests to verify COMPOUND_NAME→MSM_METABOLITE_NAME mapping, including duplicate START_SECTION entries.
Changes
| Cohort / File(s) | Summary |
|---|---|
MGF parser metadata updatesrc/openms/include/OpenMS/FORMAT/MascotGenericFile.h |
Parse COMPOUND_NAME=... lines in getNextSpectrum_ and set MSM_METABOLITE_NAME meta value; no other parsing logic altered. |
Spectral matching metadata and logging adjustmentssrc/openms/source/ANALYSIS/ID/MetaboliteSpectralMatching.cpp |
Replace cout with OPENMS_LOG_DEBUG; prefer identifiers in order: GNPS_Spectrum_ID, Massbank_Accession_ID, MSM_METABOLITE_NAME, nativeID; derive CommonName from computed metabolite_name; add detailed per-candidate logging; core matching unchanged. |
Unit tests for COMPOUND_NAME mappingsrc/tests/class_tests/openms/source/MascotGenericFile_test.cpp |
Add test to confirm COMPOUND_NAME maps to MSM_METABOLITE_NAME; verify spectrum/precursor/peaks; includes duplicate START_SECTION for the same scenario. |
Sequence Diagram(s)
sequenceDiagram
participant Reader as MascotGenericFile
participant MGF as MGF Line
participant Spec as MSSpectrum
loop Parse peak list block
MGF->>Reader: Line
alt Line starts with "COMPOUND_NAME="
Reader->>Spec: setMetaValue(MSM_METABOLITE_NAME, value)
else other keys
Reader->>Spec: existing handling
end
end
sequenceDiagram
participant MSM as MetaboliteSpectralMatching.run
participant Cand as Candidate Spectrum
participant Log as OPENMS_LOG_DEBUG
loop For each candidate
MSM->>Cand: Read metadata
MSM->>MSM: Derive metabolite_name (MSM_METABOLITE_NAME | GNPS_Spectrum_ID)
MSM->>MSM: Choose primary ID (GNPS | Massbank | MSM_NAME | nativeID)
MSM->>Log: Debug precursor m/z, IDs, keys
MSM->>MSM: Score/match (unchanged)
alt Match found
MSM->>Log: Report massbank_id and metabolite_name with score
end
end
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~15–20 minutes
Possibly related PRs
- OpenMS/OpenMS#7914: Also extends MascotGenericFile::getNextSpectrum_ to parse additional metadata fields; closely related to adding COMPOUND_NAME handling.
Suggested reviewers
- timosachsenberg
- cbielow
Poem
In the meadow of m/z and name,
I twitch my ears at COMPOUND fame.
A Caffeine whisper, neatly caught—
Logs hop by with IDs sought.
GNPS or MassBank, trails align,
Little rabbit signs: “Metadata’s fine.”
Hippity-hop, commit divine.
[!TIP]
🔌 Remote MCP (Model Context Protocol) integration is now available!
Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
nucleosides
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. - PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
CodeRabbit Commands (Invoked using PR/Issue comments)
Type @coderabbitai help to get the list of available commands.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Status, Documentation and Community
- Visit our Status Page to check the current availability of CodeRabbit.
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
needs a GNPS MGF file as test. otherwise looks good to me