pairtools
pairtools copied to clipboard
Modified CLI flags (-c1, -c2, ...) to accept both integers (column in…
Closes #264
This PR enhances the flexibility of column specification in pairtools by allowing CLI flags (-c1, -c2, etc.) to accept both column indices (ints) and column names (strs). Additionally, it refactors sorting and deduplication logic to use canonicalized column references, making pair_type (pt) optional for sorting.
🛠️ Changes Introduced
-
Modified CLI Flags (
-c1,-c2, etc.)- Now accepts both integers and strings.
- Default values remain integers for backward compatibility.
-
Implemented Conversion Functions in
headerops- Converts between column indices and names.
- Introduced a canonicalization function for consistent column handling.
-
Refactored
sortanddedupCommands- Updated sorting and deduplication logic to work with both int- and str-based column references.
- Ensured internal operations remain stable across different input types.
-
Made
pair_type (pt)Optional for Sorting- Sorting no longer strictly depends on
pair_type. - Verified correctness in cases where
pair_typeis missing.
- Sorting no longer strictly depends on
-
Updated Tests
- Added test cases for mixed column specification (ints & strs).
- Verified correctness of sorting and deduplication without
pair_type.
✅ How This Improves pairtools
- More user-friendly CLI: Users can now specify columns either by index or name, making scripts more readable.
- Backward compatibility maintained: Default values are still integers, ensuring existing workflows are not broken.
- Greater flexibility in data handling: Sorting and deduplication work smoothly even if
pair_typeis missing.
📌 Checklist Before Merge
- [] Code changes are complete and reviewed.
- [] New features are covered with unit tests.
- [] Backward compatibility is maintained.
- [] Documentation updates (if necessary).