Python icon indicating copy to clipboard operation
Python copied to clipboard

Bug: Inconsistent behavior for missing values in coordinate_compression.py

Open tommasobaiocchi opened this issue 2 months ago • 12 comments

Repository commit

a0b0f414ae134aa1772d33bb930e5a960f9979e8

Python version (python --version)

Python 3.10.7

Dependencies version (pip freeze)

Not applicable - this is a logical bug, not dependency-related

Expected behavior

Consistent behavior across all methods when handling missing values:

  • Either ALL methods should raise exceptions for invalid inputs
  • Or ALL methods should return consistent sentinel values
  • The behavior should be clearly documented
  • No silent failures mixed with exceptions

Actual behavior

Inconsistent behavior:

  1. compress(80) returns -1 (silent failure)
  2. coordinate_map[80] raises KeyError (exception)
  3. decompress(5) returns -1 (silent failure)

This mixed behavior can cause hard-to-debug issues.

tommasobaiocchi avatar Oct 05 '25 12:10 tommasobaiocchi

Hello @tommasobaiocchi , Can I work on this issue ?

DevPatel-11 avatar Oct 05 '25 13:10 DevPatel-11

please assign me this issue

AyushYadav256 avatar Oct 05 '25 14:10 AyushYadav256

Hi @DevPatel-11 and @AyushYadav256! Thanks for your interest in fixing this issue!

I agree that @DevPatel-11 should have priority since they commented first.

@DevPatel-11 - please coordinate with the maintainers on the preferred approach (exceptions vs sentinel values) before implementing.

@AyushYadav256 - there might be other similar issues in the repository that could use your help!

Looking forward to the fix!

tommasobaiocchi avatar Oct 05 '25 14:10 tommasobaiocchi

Hi @tommasobaiocchi , The compress(), coordinate_map[] and decompress() functions you are talking about are from the data_compression\coordinate_compression.py, right ?

DevPatel-11 avatar Oct 05 '25 15:10 DevPatel-11

Yes @DevPatel-11, exactly!

tommasobaiocchi avatar Oct 05 '25 15:10 tommasobaiocchi

Hello @tommasobaiocchi , I read the code. I think that the solution to this is instead of using coordinate_map[80] we can simply create a get_compressed(val) function.

def get_compressed(self, original: int) -> int:
    return self.coordinate_map.get(original, -1)

Otherwise, we directly use coordinate_map.get(80, -1).

DevPatel-11 avatar Oct 05 '25 18:10 DevPatel-11

@DevPatel-11 Thanks for the concrete proposal!

You're right that using .get(80, -1) would make the behavior consistent, but I'm concerned about a few architectural issues:

Current Inconsistency:

  • compress(80)-1 (silent failure)
  • coordinate_map[80]KeyError (exception)
  • decompress(5)-1 (silent failure)

Problems with -1 as Sentinel:

  1. Ambiguity: -1 could be a valid compressed value in some datasets
  2. Error Masking: Silent failures make debugging harder
  3. API Design: Public methods vs internal dict access should be consistent

Better Approaches:

Option A:

def compress(self, original):
    if original not in self.coordinate_map:
        raise ValueError(f"Value {original} not in original array")
    return self.coordinate_map[original]

Option B:

_MISSING = object()
def compress(self, original):
    result = self.coordinate_map.get(original, _MISSING)
    return result if result is not _MISSING else None

What do you think about these more principled approaches?

tommasobaiocchi avatar Oct 05 '25 18:10 tommasobaiocchi

Option A: Explicit Exception Pros: 1.Clear contract: compressing a non-existent value always fails loudly. 2.No ambiguity: -1 or any other value can safely exist in the mapping. 3.Easy to debug: errors are immediate and informative.

Cons: 1.Slightly less forgiving—caller must handle exceptions if uncertain of the value. 2.Could interrupt batch operations unless wrapped in try/except

Option B: Sentinel / None Pros: 1.Safe: never raises an exception. 2.Flexible: caller can check for None and handle missing values gracefully. 3.Avoids magic numbers like -1.

Cons: 1.API may silently return None if the caller forgets to check. 2.Slightly more complex to implement if _MISSING sentinel is needed in multiple places.

My Recommendation 1.If API is meant for internal use or you expect callers to always know valid values → Option A.

2.If API is user-facing and missing values are expected/normal → Option B

Amitverma0509 avatar Oct 05 '25 20:10 Amitverma0509

@Amitverma0509 Excellent analysis.

tommasobaiocchi avatar Oct 06 '25 11:10 tommasobaiocchi

Hey @tommasobaiocchi , should I mention cclaus to discuss the solution ?? I am new to the repository. I don't know any maintainers.

DevPatel-11 avatar Oct 06 '25 13:10 DevPatel-11

@DevPatel-11 Let's wait for maintainer feedback on the preferred approach. They usually monitor new issues and will provide guidance when available.

tommasobaiocchi avatar Oct 06 '25 13:10 tommasobaiocchi

Hi, I’d like to work on this issue. Can I take it?

Aditya-Singh27 avatar Nov 05 '25 19:11 Aditya-Singh27