dpdata feat: support zero-count elements in type_map for sort_atom

Refactor sort_atom_names to correctly handle type_map with zero-count elements while preserving existing alphabetical sorting behavior.
Only validate atom types that actually appear (count > 0) against the provided type_map, allowing new elements in type_map to be added with zero count.
Improve robustness and clarity of atom type remapping logic, restoring original commenting style for maintainability.

Summary by CodeRabbit

Bug Fixes
- Improved validation and clearer errors when active atom types are missing; ensured atom names, counts, and type indices are consistently reordered to match a provided mapping or an alphabetical fallback.
Tests
- Added unit tests covering mapping-based sorting, zero-count handling, missing-active-type errors, and alphabetical sorting to verify names, counts, and type indices.
Documentation
- Clarified docstrings/comments describing mapping behavior and invariants.

Oct 28 '25 09:10 SchrodingersCattt

📝 Walkthrough

Walkthrough

The sort_atom_names function in dpdata/utils.py now enforces that all active atom names exist in a provided type_map, builds a new ordering strictly from type_map (preserving active counts, assigning 0 to new types), remaps atom_types accordingly, and adjusts alphabetical sorting to use the inverse permutation for atom_types. New unit tests were added.

Changes

Cohort / File(s)	Summary
sort_atom_names implementation `dpdata/utils.py`	Validate that all active atom names (positive `atom_numbs`) are present in a provided `type_map` and raise `ValueError` if missing; construct new ordering from `type_map` (preserve active counts, set 0 for new types); compute mapping from old type indices to new and remap `atom_types` in-place; use inverse permutation for `atom_types` in alphabetical path; minor docstring/comment updates.
unit tests for type_map behavior `tests/test_type_map_utils.py`	New test module exercising: sorting with `type_map`; `type_map` entries with zero-count elements; error when active types are missing from `type_map` (asserts message contains missing labels); alphabetical sorting without `type_map`; removal/reindexing when `type_map` excludes zero-count elements. Tests verify `atom_names`, `atom_numbs`, and `atom_types` content and shapes.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant sort_atom_names
    participant Validator
    participant Reorderer
    participant Remapper

    Caller->>sort_atom_names: call(atom_names, atom_numbs, atom_types, type_map?)
    alt type_map provided
        sort_atom_names->>Validator: ensure active names ⊂ type_map
        Validator-->>sort_atom_names: ok / raise ValueError
        alt ok
            sort_atom_names->>Reorderer: build ordering from type_map (preserve counts, new->0)
            Reorderer-->>sort_atom_names: new_order, old_to_new_map
            sort_atom_names->>Remapper: remap atom_types using old_to_new_map
            Remapper-->>sort_atom_names: atom_types updated (in-place)
        else raise
            Validator-->>Caller: ValueError (missing active types)
        end
    else no type_map
        sort_atom_names->>Reorderer: compute alphabetical permutation
        Reorderer-->>sort_atom_names: perm, inverse_perm
        sort_atom_names->>Remapper: remap atom_types using inverse_perm
        Remapper-->>sort_atom_names: atom_types updated
    end
    sort_atom_names-->>Caller: atom_names, atom_numbs, atom_types updated in-place

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect type_map validation logic and exact ValueError message content used by tests.
Verify correctness of old->new index mapping and that atom_types remapping preserves dtype, shape, and semantics.
Check handling of zero-count entries (preserve vs remove per type_map) and edge cases in tests.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat: support zero-count elements in type_map for sort_atom_names" directly describes the primary objective of the changeset. The modifications to `dpdata/utils.py` refactor the `sort_atom_names` function specifically to handle type_map entries with zero-count elements, and the new test module comprehensively validates this behavior. The title is clear, specific, and uses proper conventional commit conventions, allowing developers scanning the history to immediately understand the feature being introduced.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfde1f49e5bbb82dd0fe2d7c03e5472e0d52c6ad and 3ec40471b8dfaa2c857d9912e1f1f5a716757bd1.

📒 Files selected for processing (1)

tests/test_type_map_utils.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_type_map_utils.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Oct 28 '25 09:10 coderabbitai[bot]

CodSpeed Performance Report

Merging #912 will not alter performance

_{Comparing SchrodingersCattt:enhance/type-map-handling (3ec4047) with devel (7af8d74)}

:warning: Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 2 untouched
⏩ 2 skipped[^skipped]

[^skipped]: 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Oct 30 '25 11:10 codspeed-hq[bot]

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 86.10%. Comparing base (7af8d74) to head (3ec4047).

Additional details and impacted files

@@            Coverage Diff             @@
##            devel     #912      +/-   ##
==========================================
+ Coverage   86.06%   86.10%   +0.03%     
==========================================
  Files          83       83              
  Lines        7886     7908      +22     
==========================================
+ Hits         6787     6809      +22     
  Misses       1099     1099

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Oct 30 '25 11:10 codecov[bot]

feat: support zero-count elements in type_map for sort_atom_names

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

CodSpeed Performance Report

Merging #912 will not alter performance

Summary

Codecov Report