tika-python
tika-python copied to clipboard
Use of hashlib.MD5 on FIPS configured installations
Specifically, every use of hashlib.md5() is an issue for FIPS kernels which lack openssl support for md5. Can hashlib.md5() be cahnged to use heshlib.new with the usedforsecurity set to False?
Specifically, around line 614 in tika.py: m = hashlib.md5() Replace with m = hashlib.new('MD5', userforsecurity=False)
would be definitely open to this, @scarton that said, would need a PR and a test to expose it. Would also make sense to update travis.yml to specifically test for this. When you have a PR ready please open up a new PR and I'll review.
Hi, I am interested in picking up the work for this. However I'd like to simply update the md5 check to a sha1 check. If there was another checksum provided by the tika maven repository I'd use that, but sha1 is the best we've got at the moment, assuming there's not some FIPS-compliant manner of verifying the .asc file I see in the repo (i.e. https://repo1.maven.org/maven2/org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0-bin.zip.asc). Does that sound reasonable?
Hi @griffin-rickle yes sounds reasonable, but could you also make it back compat by providing an env var (maybe TIKA_JAR_HASH
or something) that identifies the name of the HASH file type and by default sets it to md5
but allows changing to asc
?
Sure, I can do that! Just for awareness, the TIKA_JAR_HASH
will default to md5
but sha1
will be an allowed value (not asc
, since the asc provides a mechanism to verify the signature, not a checksum).