OpenMS icon indicating copy to clipboard operation
OpenMS copied to clipboard

MetaboliteSpectralMatcher fixes from playing with Nucleosides

Open poshul opened this issue 5 months ago • 1 comments

Description

MetaboliteSpectralMatching has some issues parsing non-massbank libraries. Add some debugging code and update our MGF parser so that we can read data from GNPS. Also fix Deb requirements

Checklist

  • [ ] Make sure that you are listed in the AUTHORS file
  • [ ] Add relevant changes and new features to the CHANGELOG file
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.seqan.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number. If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

:warning: Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

Summary by CodeRabbit

  • New Features

    • Support reading COMPOUND_NAME from MGF and mapping it to the metabolite name.
    • Improved identifier selection and display: prioritize GNPS Spectrum ID, then MassBank Accession, then metabolite name, with sensible fallbacks.
    • More informative debug output showing candidate metadata and derived metabolite names during matching.
  • Refactor

    • Replaced console prints with structured debug logging and updated result reporting to use derived names/IDs.
  • Tests

    • Added tests validating COMPOUND_NAME-to-metabolite-name mapping from MGF.

poshul avatar Aug 11 '25 09:08 poshul

Walkthrough

Adds parsing of COMPOUND_NAME in MascotGenericFile to set MSM_METABOLITE_NAME; updates MetaboliteSpectralMatching to adjust identifier preference and logging/metadata handling; extends unit tests to verify COMPOUND_NAME→MSM_METABOLITE_NAME mapping, including duplicate START_SECTION entries.

Changes

Cohort / File(s) Summary
MGF parser metadata update
src/openms/include/OpenMS/FORMAT/MascotGenericFile.h
Parse COMPOUND_NAME=... lines in getNextSpectrum_ and set MSM_METABOLITE_NAME meta value; no other parsing logic altered.
Spectral matching metadata and logging adjustments
src/openms/source/ANALYSIS/ID/MetaboliteSpectralMatching.cpp
Replace cout with OPENMS_LOG_DEBUG; prefer identifiers in order: GNPS_Spectrum_ID, Massbank_Accession_ID, MSM_METABOLITE_NAME, nativeID; derive CommonName from computed metabolite_name; add detailed per-candidate logging; core matching unchanged.
Unit tests for COMPOUND_NAME mapping
src/tests/class_tests/openms/source/MascotGenericFile_test.cpp
Add test to confirm COMPOUND_NAME maps to MSM_METABOLITE_NAME; verify spectrum/precursor/peaks; includes duplicate START_SECTION for the same scenario.

Sequence Diagram(s)

sequenceDiagram
  participant Reader as MascotGenericFile
  participant MGF as MGF Line
  participant Spec as MSSpectrum

  loop Parse peak list block
    MGF->>Reader: Line
    alt Line starts with "COMPOUND_NAME="
      Reader->>Spec: setMetaValue(MSM_METABOLITE_NAME, value)
    else other keys
      Reader->>Spec: existing handling
    end
  end
sequenceDiagram
  participant MSM as MetaboliteSpectralMatching.run
  participant Cand as Candidate Spectrum
  participant Log as OPENMS_LOG_DEBUG

  loop For each candidate
    MSM->>Cand: Read metadata
    MSM->>MSM: Derive metabolite_name (MSM_METABOLITE_NAME | GNPS_Spectrum_ID)
    MSM->>MSM: Choose primary ID (GNPS | Massbank | MSM_NAME | nativeID)
    MSM->>Log: Debug precursor m/z, IDs, keys
    MSM->>MSM: Score/match (unchanged)
    alt Match found
      MSM->>Log: Report massbank_id and metabolite_name with score
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Possibly related PRs

  • OpenMS/OpenMS#7914: Also extends MascotGenericFile::getNextSpectrum_ to parse additional metadata fields; closely related to adding COMPOUND_NAME handling.

Suggested reviewers

  • timosachsenberg
  • cbielow

Poem

In the meadow of m/z and name,
I twitch my ears at COMPOUND fame.
A Caffeine whisper, neatly caught—
Logs hop by with IDs sought.
GNPS or MassBank, trails align,
Little rabbit signs: “Metadata’s fine.”
Hippity-hop, commit divine.

[!TIP]

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch nucleosides

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Aug 11 '25 09:08 coderabbitai[bot]

needs a GNPS MGF file as test. otherwise looks good to me

timosachsenberg avatar Dec 22 '25 18:12 timosachsenberg