Bug: Inconsistent behavior for missing values in coordinate_compression.py
Repository commit
a0b0f414ae134aa1772d33bb930e5a960f9979e8
Python version (python --version)
Python 3.10.7
Dependencies version (pip freeze)
Not applicable - this is a logical bug, not dependency-related
Expected behavior
Consistent behavior across all methods when handling missing values:
- Either ALL methods should raise exceptions for invalid inputs
- Or ALL methods should return consistent sentinel values
- The behavior should be clearly documented
- No silent failures mixed with exceptions
Actual behavior
Inconsistent behavior:
compress(80)returns-1(silent failure)coordinate_map[80]raisesKeyError(exception)decompress(5)returns-1(silent failure)
This mixed behavior can cause hard-to-debug issues.
Hello @tommasobaiocchi , Can I work on this issue ?
please assign me this issue
Hi @DevPatel-11 and @AyushYadav256! Thanks for your interest in fixing this issue!
I agree that @DevPatel-11 should have priority since they commented first.
@DevPatel-11 - please coordinate with the maintainers on the preferred approach (exceptions vs sentinel values) before implementing.
@AyushYadav256 - there might be other similar issues in the repository that could use your help!
Looking forward to the fix!
Hi @tommasobaiocchi ,
The compress(), coordinate_map[] and decompress() functions you are talking about are from the data_compression\coordinate_compression.py, right ?
Yes @DevPatel-11, exactly!
Hello @tommasobaiocchi ,
I read the code. I think that the solution to this is instead of using coordinate_map[80] we can simply create a get_compressed(val) function.
def get_compressed(self, original: int) -> int:
return self.coordinate_map.get(original, -1)
Otherwise, we directly use coordinate_map.get(80, -1).
@DevPatel-11 Thanks for the concrete proposal!
You're right that using .get(80, -1) would make the behavior consistent, but I'm concerned about a few architectural issues:
Current Inconsistency:
compress(80)→-1(silent failure)coordinate_map[80]→KeyError(exception)decompress(5)→-1(silent failure)
Problems with -1 as Sentinel:
- Ambiguity:
-1could be a valid compressed value in some datasets - Error Masking: Silent failures make debugging harder
- API Design: Public methods vs internal dict access should be consistent
Better Approaches:
Option A:
def compress(self, original):
if original not in self.coordinate_map:
raise ValueError(f"Value {original} not in original array")
return self.coordinate_map[original]
Option B:
_MISSING = object()
def compress(self, original):
result = self.coordinate_map.get(original, _MISSING)
return result if result is not _MISSING else None
What do you think about these more principled approaches?
Option A: Explicit Exception Pros: 1.Clear contract: compressing a non-existent value always fails loudly. 2.No ambiguity: -1 or any other value can safely exist in the mapping. 3.Easy to debug: errors are immediate and informative.
Cons: 1.Slightly less forgiving—caller must handle exceptions if uncertain of the value. 2.Could interrupt batch operations unless wrapped in try/except
Option B: Sentinel / None Pros: 1.Safe: never raises an exception. 2.Flexible: caller can check for None and handle missing values gracefully. 3.Avoids magic numbers like -1.
Cons: 1.API may silently return None if the caller forgets to check. 2.Slightly more complex to implement if _MISSING sentinel is needed in multiple places.
My Recommendation 1.If API is meant for internal use or you expect callers to always know valid values → Option A.
2.If API is user-facing and missing values are expected/normal → Option B
@Amitverma0509 Excellent analysis.
Hey @tommasobaiocchi , should I mention cclaus to discuss the solution ?? I am new to the repository. I don't know any maintainers.
@DevPatel-11 Let's wait for maintainer feedback on the preferred approach. They usually monitor new issues and will provide guidance when available.
Hi, I’d like to work on this issue. Can I take it?