dpdata icon indicating copy to clipboard operation
dpdata copied to clipboard

feat: support zero-count elements in type_map for sort_atom_names

Open SchrodingersCattt opened this issue 2 months ago β€’ 3 comments

  • Refactor sort_atom_names to correctly handle type_map with zero-count elements while preserving existing alphabetical sorting behavior.
  • Only validate atom types that actually appear (count > 0) against the provided type_map, allowing new elements in type_map to be added with zero count.
  • Improve robustness and clarity of atom type remapping logic, restoring original commenting style for maintainability.

Summary by CodeRabbit

  • Bug Fixes
    • Improved validation and clearer errors when active atom types are missing; ensured atom names, counts, and type indices are consistently reordered to match a provided mapping or an alphabetical fallback.
  • Tests
    • Added unit tests covering mapping-based sorting, zero-count handling, missing-active-type errors, and alphabetical sorting to verify names, counts, and type indices.
  • Documentation
    • Clarified docstrings/comments describing mapping behavior and invariants.

SchrodingersCattt avatar Oct 28 '25 09:10 SchrodingersCattt

πŸ“ Walkthrough

Walkthrough

The sort_atom_names function in dpdata/utils.py now enforces that all active atom names exist in a provided type_map, builds a new ordering strictly from type_map (preserving active counts, assigning 0 to new types), remaps atom_types accordingly, and adjusts alphabetical sorting to use the inverse permutation for atom_types. New unit tests were added.

Changes

Cohort / File(s) Summary
sort_atom_names implementation
dpdata/utils.py
Validate that all active atom names (positive atom_numbs) are present in a provided type_map and raise ValueError if missing; construct new ordering from type_map (preserve active counts, set 0 for new types); compute mapping from old type indices to new and remap atom_types in-place; use inverse permutation for atom_types in alphabetical path; minor docstring/comment updates.
unit tests for type_map behavior
tests/test_type_map_utils.py
New test module exercising: sorting with type_map; type_map entries with zero-count elements; error when active types are missing from type_map (asserts message contains missing labels); alphabetical sorting without type_map; removal/reindexing when type_map excludes zero-count elements. Tests verify atom_names, atom_numbs, and atom_types content and shapes.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant sort_atom_names
    participant Validator
    participant Reorderer
    participant Remapper

    Caller->>sort_atom_names: call(atom_names, atom_numbs, atom_types, type_map?)
    alt type_map provided
        sort_atom_names->>Validator: ensure active names βŠ‚ type_map
        Validator-->>sort_atom_names: ok / raise ValueError
        alt ok
            sort_atom_names->>Reorderer: build ordering from type_map (preserve counts, new->0)
            Reorderer-->>sort_atom_names: new_order, old_to_new_map
            sort_atom_names->>Remapper: remap atom_types using old_to_new_map
            Remapper-->>sort_atom_names: atom_types updated (in-place)
        else raise
            Validator-->>Caller: ValueError (missing active types)
        end
    else no type_map
        sort_atom_names->>Reorderer: compute alphabetical permutation
        Reorderer-->>sort_atom_names: perm, inverse_perm
        sort_atom_names->>Remapper: remap atom_types using inverse_perm
        Remapper-->>sort_atom_names: atom_types updated
    end
    sort_atom_names-->>Caller: atom_names, atom_numbs, atom_types updated in-place

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect type_map validation logic and exact ValueError message content used by tests.
  • Verify correctness of old->new index mapping and that atom_types remapping preserves dtype, shape, and semantics.
  • Check handling of zero-count entries (preserve vs remove per type_map) and edge cases in tests.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
βœ… Passed checks (2 passed)
Check name Status Explanation
Description Check βœ… Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check βœ… Passed The pull request title "feat: support zero-count elements in type_map for sort_atom_names" directly describes the primary objective of the changeset. The modifications to dpdata/utils.py refactor the sort_atom_names function specifically to handle type_map entries with zero-count elements, and the new test module comprehensively validates this behavior. The title is clear, specific, and uses proper conventional commit conventions, allowing developers scanning the history to immediately understand the feature being introduced.
✨ Finishing touches
  • [ ] πŸ“ Generate docstrings
πŸ§ͺ Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

πŸ“œ Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between cfde1f49e5bbb82dd0fe2d7c03e5472e0d52c6ad and 3ec40471b8dfaa2c857d9912e1f1f5a716757bd1.

πŸ“’ Files selected for processing (1)
  • tests/test_type_map_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_type_map_utils.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 28 '25 09:10 coderabbitai[bot]

CodSpeed Performance Report

Merging #912 will not alter performance

Comparing SchrodingersCattt:enhance/type-map-handling (3ec4047) with devel (7af8d74)

:warning: Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

βœ… 2 untouched
⏩ 2 skipped[^skipped]

[^skipped]: 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

codspeed-hq[bot] avatar Oct 30 '25 11:10 codspeed-hq[bot]

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 86.10%. Comparing base (7af8d74) to head (3ec4047).

Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #912      +/-   ##
==========================================
+ Coverage   86.06%   86.10%   +0.03%     
==========================================
  Files          83       83              
  Lines        7886     7908      +22     
==========================================
+ Hits         6787     6809      +22     
  Misses       1099     1099              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 30 '25 11:10 codecov[bot]