Soda Core fails in FIPS-enabled environments due to use of hashlib.blake2b
Summary
Soda Core fails to run in FIPS-enabled environments due to the use of hashlib.blake2b, which is not FIPS 140-2 compliant and is therefore disabled in these environments.
Error
When attempting to use Soda Core with FIPS mode enabled (e.g., on hardened Linux systems), the following error is thrown:
TypeError: 'digest_size' is an invalid keyword argument for this function
This happens because blake2b is unavailable in Python under FIPS mode.
Affected Code
The use of hashlib.blake2b appears in multiple places in the codebase for purposes like hashing identifiers or computing fingerprints.
Proposed Solution
Introduce a utility function such as fips_safe_hash() that:
- Uses
hashlib.blake2b()when available. - Falls back to
hashlib.sha256()with truncation when in FIPS mode.
import hashlib
def fips_safe_hash(data: bytes, digest_size=32) -> bytes:
try:
return hashlib.blake2b(data, digest_size=digest_size).digest()
except (TypeError, ValueError, AttributeError):
return hashlib.sha256(data).digest()[:digest_size]
Then replace all hashlib.blake2b(...) calls with this wrapper to ensure compatibility in both FIPS and non-FIPS environments.
CLOUD-9197
https://github.com/sodadata/soda-core/pull/2357
Hi @sumit-gupta-sgt, thank you for the report and contribution! I'm a bit cautious about merging #2357 as it changes the identities depending on the availability of hashlib.blake2b. These identities are intended to be reproducible for a given set of inputs, regardless of the environment.
To move forward on this issue, could you share some more info about your use case? Are you using the Soda Cloud platform or only the soda-core package?
Hi @mivds ,
Thank you for the quick response and for reviewing PR #2357.
You're absolutely right to be cautious about introducing environment-dependent behavior. In our case, we're working in a FIPS-enabled environment, which restricts certain hashing algorithms — notably, hashlib.blake2b is not available due to FIPS compliance.
We're currently using only the soda-core package, not the full Soda Cloud platform. Because soda-core fails at runtime in FIPS mode due to the use of blake2b, we're looking for a way to ensure compatibility without diverging from the intended reproducibility goals.
That said, I’d be happy to explore alternative ways to handle hashing that remain FIPS-compliant and deterministic across environments. Would you be open to a discussion or suggestion around this?
Thanks again, Sumit
Hi @sumit-gupta-sgt, thanks for the feedback!
These identities are really only needed when using the Soda Cloud platform. They are used to identify checks across executions. Since you are only using soda-core, we could return None for the identities when hashlib.lake2b is not available. This then gives a clear indication that there is no identity, which seems much safer than having environment-dependent identities.
Would that work for your use case? Or are you using these identities in downstream processing?
Hi @mivds,
Thanks for the clarification. I’m not using these identities for any downstream processing, so returning None when hashlib.blake2b is not available would work fine for my use case.
For context — in our environment, FIPS mode is enabled, which means hashlib.blake2b isn’t available at runtime. This was causing failures even though we’re only using soda-core (not Soda Cloud). Returning None in these cases would make the behavior deterministic and avoid environment-dependent differences, which I agree is safer and more predictable.
Hi @sumit-gupta-sgt, I've prepared a new PR in #2379 where identities are disabled if hashlib.blake2b is not available. Could you please check and confirm this fixes the issue in your environment?
I'm not sure how to enable FIPS mode (if that's even possible with standard CPython), so can't check if the expected ImportError exception path will be triggered.