soda-core icon indicating copy to clipboard operation
soda-core copied to clipboard

Soda Core fails in FIPS-enabled environments due to use of hashlib.blake2b

Open sumit-gupta-sgt opened this issue 5 months ago • 7 comments

Summary

Soda Core fails to run in FIPS-enabled environments due to the use of hashlib.blake2b, which is not FIPS 140-2 compliant and is therefore disabled in these environments.

Error

When attempting to use Soda Core with FIPS mode enabled (e.g., on hardened Linux systems), the following error is thrown:

TypeError: 'digest_size' is an invalid keyword argument for this function

This happens because blake2b is unavailable in Python under FIPS mode.

Affected Code

The use of hashlib.blake2b appears in multiple places in the codebase for purposes like hashing identifiers or computing fingerprints.

Proposed Solution

Introduce a utility function such as fips_safe_hash() that:

  • Uses hashlib.blake2b() when available.
  • Falls back to hashlib.sha256() with truncation when in FIPS mode.
import hashlib

def fips_safe_hash(data: bytes, digest_size=32) -> bytes:
    try:
        return hashlib.blake2b(data, digest_size=digest_size).digest()
    except (TypeError, ValueError, AttributeError):
        return hashlib.sha256(data).digest()[:digest_size]

Then replace all hashlib.blake2b(...) calls with this wrapper to ensure compatibility in both FIPS and non-FIPS environments.

sumit-gupta-sgt avatar Jul 30 '25 23:07 sumit-gupta-sgt

CLOUD-9197

tools-soda avatar Jul 30 '25 23:07 tools-soda

https://github.com/sodadata/soda-core/pull/2357

sumit-gupta-sgt avatar Jul 31 '25 21:07 sumit-gupta-sgt

Hi @sumit-gupta-sgt, thank you for the report and contribution! I'm a bit cautious about merging #2357 as it changes the identities depending on the availability of hashlib.blake2b. These identities are intended to be reproducible for a given set of inputs, regardless of the environment.

To move forward on this issue, could you share some more info about your use case? Are you using the Soda Cloud platform or only the soda-core package?

mivds avatar Aug 05 '25 22:08 mivds

Hi @mivds ,

Thank you for the quick response and for reviewing PR #2357.

You're absolutely right to be cautious about introducing environment-dependent behavior. In our case, we're working in a FIPS-enabled environment, which restricts certain hashing algorithms — notably, hashlib.blake2b is not available due to FIPS compliance.

We're currently using only the soda-core package, not the full Soda Cloud platform. Because soda-core fails at runtime in FIPS mode due to the use of blake2b, we're looking for a way to ensure compatibility without diverging from the intended reproducibility goals.

That said, I’d be happy to explore alternative ways to handle hashing that remain FIPS-compliant and deterministic across environments. Would you be open to a discussion or suggestion around this?

Thanks again, Sumit

sumit-gupta-sgt avatar Aug 05 '25 23:08 sumit-gupta-sgt

Hi @sumit-gupta-sgt, thanks for the feedback!

These identities are really only needed when using the Soda Cloud platform. They are used to identify checks across executions. Since you are only using soda-core, we could return None for the identities when hashlib.lake2b is not available. This then gives a clear indication that there is no identity, which seems much safer than having environment-dependent identities.

Would that work for your use case? Or are you using these identities in downstream processing?

mivds avatar Aug 14 '25 12:08 mivds

Hi @mivds,

Thanks for the clarification. I’m not using these identities for any downstream processing, so returning None when hashlib.blake2b is not available would work fine for my use case.

For context — in our environment, FIPS mode is enabled, which means hashlib.blake2b isn’t available at runtime. This was causing failures even though we’re only using soda-core (not Soda Cloud). Returning None in these cases would make the behavior deterministic and avoid environment-dependent differences, which I agree is safer and more predictable.

sumit-gupta-sgt avatar Aug 14 '25 17:08 sumit-gupta-sgt

Hi @sumit-gupta-sgt, I've prepared a new PR in #2379 where identities are disabled if hashlib.blake2b is not available. Could you please check and confirm this fixes the issue in your environment?

I'm not sure how to enable FIPS mode (if that's even possible with standard CPython), so can't check if the expected ImportError exception path will be triggered.

mivds avatar Aug 18 '25 20:08 mivds