nfcompose
nfcompose copied to clipboard
Alternative to hashing uuid generation strategy
Currently we only support the rather wasteful pattern of generating canonical ids for DataPoints using this logic:
https://github.com/neuroforgede/nfcompose/blob/main/skipper/skipper/dataseries/storage/uuid.py
def _gen_uuid(data_series_id: Union[uuid.UUID, str], external_id: str) -> str:
computed_id = hashlib.sha256(external_id.encode('UTF-8')).hexdigest()
return f'{str(data_series_id)}-{str(computed_id)}'
This is rather wasteful. Changing this without coordination will cause issues, though: Uniqueness inside a dataseries depends on a consistent implementation of this logic. We could however add a configuration setting to the DataSeries that allows for different uuid stategies to be used - e.g. less wasteful hash functions or just the identity function.
While this is not an urgent issue, this is something that could be useful.