OpenMS icon indicating copy to clipboard operation
OpenMS copied to clipboard

Enhancements in MSExperiment and MSSpectrum classes

Open jpfeuffer opened this issue 1 year ago • 15 comments

User description

Description

Adds faster, simpler and customizable extract and aggregate functions to MSExperiment.

TODOs:

  • Decide on final API. Probably templatize in/outputs of aggregation functions, such that you can extract a Chromatogram object instead of just std::vectors or doubles
  • More docs
  • Add support for grouped extractions with a third aggregation function on how to combine extractions in that group (basically enabling full OpenSwath "feature finding" pipeline). Benefits from extra scheduling of group aggregation threads to start aggregating once a group was fully extracted

Checklist

  • [ ] Make sure that you are listed in the AUTHORS file
  • [ ] Add relevant changes and new features to the CHANGELOG file
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.openms.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number. If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

:warning: Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).


Type

Enhancement


Description

  • Added new methods for aggregating data in the MSExperiment class. These methods include aggregate functions with various parameters and overloads, getSpectraIdxRangeByRetentionTime, getSpectraIdcsByRetentionTime, getFirstProductSpectrum, and getRangesIdcs_. These methods provide more flexible and efficient ways to aggregate and retrieve data from MSExperiment objects.
  • Introduced OpenMP parallelization in some of the new methods in the MSExperiment class to improve performance.
  • Added a new test case for the aggregate method in the MSExperiment class.
  • Added the maybeGetIMData method in the MSSpectrum class, which returns ion mobility data if available.

Changes walkthrough

Relevant files                                                                                                                                 
Enhancement
MSExperiment.cpp                                                                                       
    src/openms/source/KERNEL/MSExperiment.cpp

    Added new methods for aggregating data in the MSExperiment
    class. These methods include aggregate functions with
    various parameters and overloads,
    getSpectraIdxRangeByRetentionTime,
    getSpectraIdcsByRetentionTime, getFirstProductSpectrum,
    and getRangesIdcs_. Also, OpenMP parallelization was
    introduced in some of the new methods.

+377/-9
MSSpectrum.cpp                                                                                           
    src/openms/source/KERNEL/MSSpectrum.cpp

    Added the maybeGetIMData method in the MSSpectrum class,
    which returns ion mobility data if available.

+13/-0
Tests
MSExperiment_test.cpp                                                                             
    src/tests/class_tests/openms/source/MSExperiment_test.cpp

    Added a new test case for the aggregate method in the
    MSExperiment class.

+78/-0

✨ Usage guide:

Overview: The describe tool scans the PR code changes, and generates a description for the PR - title, type, summary, walkthrough and labels. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.

When commenting, to edit configurations related to the describe tool (pr_description section), use the following template:

/describe --pr_description.some_config1=... --pr_description.some_config2=...

With a configuration file, use the following template:

[pr_description]
some_config1=...
some_config2=...
Enabling\disabling automation
  • When you first install the app, the default mode for the describe tool is:
pr_commands = ["/describe --pr_description.add_original_user_description=true" 
                         "--pr_description.keep_original_user_title=true", ...]

meaning the describe tool will run automatically on every PR, will keep the original title, and will add the original user description above the generated description.

  • Markers are an alternative way to control the generated description, to give maximal control to the user. If you set:
pr_commands = ["/describe --pr_description.use_description_markers=true", ...]

the tool will replace every marker of the form pr_agent:marker_name in the PR description with the relevant content, where marker_name is one of the following:

  • type: the PR type.
  • summary: the PR summary.
  • walkthrough: the PR walkthrough.

Note that when markers are enabled, if the original PR description does not contain any markers, the tool will not alter the description at all.

Custom labels

The default labels of the describe tool are quite generic: [Bug fix, Tests, Enhancement, Documentation, Other].

If you specify custom labels in the repo's labels page or via configuration file, you can get tailored labels for your use cases. Examples for custom labels:

  • Main topic:performance - pr_agent:The main topic of this PR is performance
  • New endpoint - pr_agent:A new endpoint was added in this PR
  • SQL query - pr_agent:A new SQL query was added in this PR
  • Dockerfile changes - pr_agent:The PR contains changes in the Dockerfile
  • ...

The list above is eclectic, and aims to give an idea of different possibilities. Define custom labels that are relevant for your repo and use cases. Note that Labels are not mutually exclusive, so you can add multiple label categories. Make sure to provide proper title, and a detailed and well-phrased description for each label, so the tool will know when to suggest it.

Utilizing extra instructions

The describe tool can be configured with extra instructions, to guide the model to a feedback tailored to the needs of your project.

Be specific, clear, and concise in the instructions. With extra instructions, you are the prompter. Notice that the general structure of the description is fixed, and cannot be changed. Extra instructions can change the content or style of each sub-section of the PR description.

Examples for extra instructions:

[pr_description] 
extra_instructions="""
- The PR title should be in the format: '<PR type>: <title>'
- The title should be short and concise (up to 10 words)
- ...
"""

Use triple quotes to write multi-line instructions. Use bullet points to make the instructions more readable.

More PR-Agent commands

To invoke the PR-Agent, add a comment using one of the following commands:

  • /review: Request a review of your Pull Request.
  • /describe: Update the PR title and description based on the contents of the PR.
  • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
  • /ask <QUESTION>: Ask a question about the PR.
  • /update_changelog: Update the changelog based on the PR's contents.
  • /add_docs 💎: Generate docstring for new components introduced in the PR.
  • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
  • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

See the tools guide for more details. To list the possible configuration parameters, add a /config comment.

See the describe usage page for a comprehensive guide on using this tool.

jpfeuffer avatar Jan 14 '24 18:01 jpfeuffer