ureg.parse_units("km",case_sensitive=False) returns randomly kilomolar or kilometer
The function ureg.parse_units(‘km’, case_sensitive=False) returns both kilometres and kilomoles randomly when case_sensitive is False. I would suggest that the _yield_unit_triplets function first perform a case-sensitive search in any case, then perform a final case-insensitive search only if the input parameter ‘CaseSensitive’ is False and the first search yielded no results. This way the a case-sensitive result would be always prefered than a case-unsensitive one.
I have already written a new version of ‘_yield_unit_triplets’ and tested its compatibility for several months.
def _yield_unit_triplets(
self, unit_name: str, case_sensitive: bool
) -> Generator[tuple[str, str, str], None, None]:
"""Helper of parse_unit_name."""
stw = unit_name.startswith
edw = unit_name.endswith
yelded_any=False
for case_sensitive_search_cycle in (True,False):
for suffix, prefix in itertools.product(self._suffixes, self._prefixes):
if stw(prefix) and edw(suffix):
name = unit_name[len(prefix) :]
if suffix:
name = name[: -len(suffix)]
if len(name) == 1:
continue
if case_sensitive_search_cycle:
if name in self._units:
yelded_any=True
yield (
self._prefixes[prefix].name,
self._units[name].name,
self._suffixes[suffix],
)
else:
for real_name in self._units_casei.get(name.lower(), ()):
yield (
self._prefixes[prefix].name,
self._units[real_name].name,
self._suffixes[suffix],
)
if case_sensitive or yelded_any:
break
Since I have never contributed to the code, I kindly ask an administrator to verify it and possibly add it to the original code.
I am running into a seemingly similar problem with case insensitive mode and kilogauss versus kilogram. I would think kilogram ought to take precedence for lowercase kg.
A PR with tests is welcomed. Let me know if you need a hand doing making one.
I was surprised that such an option exists. It only makes sense for specialized custom unit files but it is fundamentally incompatible with the default units file.
For a case-insensitive unit encoding option, on could look at UCUM.