conda-build icon indicating copy to clipboard operation
conda-build copied to clipboard

add `content_sha256` hash checks

Open jaimergp opened this issue 2 years ago • 5 comments

Closes https://github.com/conda/conda-build/issues/4762

Description

Checklist - did you ...

  • [ ] Add a file to the news directory (using the template) for the next release's release notes?
  • [X] Add / update necessary tests?
  • [ ] Add / update outdated documentation?

jaimergp avatar Apr 12 '24 15:04 jaimergp

CodSpeed Performance Report

Merging #5277 will not alter performance

Comparing jaimergp:content-hash (4e0f6dd) with main (4457362)

Summary

✅ 3 untouched benchmarks

codspeed-hq[bot] avatar Apr 12 '24 16:04 codspeed-hq[bot]

I think this is cool. It would also work nicely with the new proposal for "rendered recipes" (https://github.com/conda-incubator/ceps/pull/74).

On that note - should we continue adding features to conda-build without any standardization (e.g. CEP) process?

wolfv avatar Apr 15 '24 09:04 wolfv

should we continue adding features to conda-build without any standardization (e.g. CEP) process?

I'm planning to submit a CEP. I opened this draft to explore what kind of things are needed for a stable yet robust logic, cross platform. Things like permissions and so on don't translate well to Windows.

jaimergp avatar Apr 15 '24 13:04 jaimergp

Awesome. Yeah, I also recently looked at a few content hash implementations in Rust but didn't find anything super convincing yet. There are a bunch though (https://crates.io/search?q=content%20hash)

wolfv avatar Apr 15 '24 13:04 wolfv

So far the scheme I followed looks a lot like https://github.com/DrSLDR/dasher?tab=readme-ov-file#hashing-scheme. Things to standardize would be how the tree is sorted, the normalization of the path, the separators (to prevent this), and the allowed algorithms.

I've seen a few merkle tree based packages but we don't need all the proof stuff, or leaf querying; just comparing the root hash.

Maybe it could be implemented in a recursive way that doesn't involve obtaining the whole file tree beforehand if that increases performance or simplifies implementation elsewhere. IMO this feels like one of those CEPs that does require prototyping first to see which things have to be standardized.

jaimergp avatar Apr 15 '24 16:04 jaimergp