tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

Use of hashlib.MD5 on FIPS configured installations

Open scarton opened this issue 3 years ago • 1 comments

Specifically, every use of hashlib.md5() is an issue for FIPS kernels which lack openssl support for md5. Can hashlib.md5() be cahnged to use heshlib.new with the usedforsecurity set to False?

scarton avatar Jul 09 '21 18:07 scarton

Specifically, around line 614 in tika.py: m = hashlib.md5() Replace with m = hashlib.new('MD5', userforsecurity=False)

scarton avatar Jul 12 '21 13:07 scarton

would be definitely open to this, @scarton that said, would need a PR and a test to expose it. Would also make sense to update travis.yml to specifically test for this. When you have a PR ready please open up a new PR and I'll review.

chrismattmann avatar Dec 31 '22 22:12 chrismattmann

Hi, I am interested in picking up the work for this. However I'd like to simply update the md5 check to a sha1 check. If there was another checksum provided by the tika maven repository I'd use that, but sha1 is the best we've got at the moment, assuming there's not some FIPS-compliant manner of verifying the .asc file I see in the repo (i.e. https://repo1.maven.org/maven2/org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0-bin.zip.asc). Does that sound reasonable?

griffin-rickle avatar Jul 18 '23 19:07 griffin-rickle

Hi @griffin-rickle yes sounds reasonable, but could you also make it back compat by providing an env var (maybe TIKA_JAR_HASH or something) that identifies the name of the HASH file type and by default sets it to md5 but allows changing to asc?

chrismattmann avatar Jul 25 '23 15:07 chrismattmann

Sure, I can do that! Just for awareness, the TIKA_JAR_HASH will default to md5 but sha1 will be an allowed value (not asc, since the asc provides a mechanism to verify the signature, not a checksum).

griffin-rickle avatar Jul 25 '23 18:07 griffin-rickle