script icon indicating copy to clipboard operation
script copied to clipboard

Generic HashSum and HashSums

Open wishdev opened this issue 2 years ago • 8 comments

This would replace #113/#114 and "depreciate" SHA256Sum and SHA256Sums by adding more generic functions called HashSum and HashSums.

One could call p.HashSum("SHA-256") or p.HashSums("SHA3-512") as opposed to adding specific functions for each hash algorithm desired.

This uses the crypto package and the official blessed algorithms within.

There are probably some extra tests (specifically passing an invalid hash name) and some docs to add - but I wanted to make sure I was on the right path here.....

SHA256Sum and SHA256Sums are replaced internally with calls to HashSum("SHA-256") and HashSums("SHA-256").

Finally, the internals are condensed and HashSum and HashSums use a common hash function to which they pass their io.Reader

wishdev avatar Aug 20 '22 03:08 wishdev

Thanks, @wishdev! This is certainly more flexible than what we have at the moment.

I'm not sure if specifying the desired hash algorithm as a string is the right way to go. It would make sense to define these as constants:

script.Echo("hello").HashSum(script.SHA-256).Stdout()

However, this isn't valid Go, because identifiers can only contain letters. This is presumably why the constants defined in crypto are spelled the way they are: https://pkg.go.dev/crypto#Hash

It also sets up a potentially irksome maintainer chore of keeping the list up to date with Go's crypto library. So maybe we should just use its constants:

script.Echo("secret password").Hash(crypto.SHA256).Stdout()

bitfield avatar Aug 20 '22 08:08 bitfield

Sounds perfectly reasonable. Updated...

wishdev avatar Aug 20 '22 22:08 wishdev

Nice! Shall we now think about an example that demonstrates the need for HashSums with a non-trivial script pipeline?

bitfield avatar Aug 21 '22 08:08 bitfield

'HashSums' isn't new functionality. It replaces SHA256Sums which is used for hashng files. You pass it a slice of file names and it will return the hash of each file. There is a test already setup for SHA256Sums within the test file.

wishdev avatar Aug 21 '22 16:08 wishdev

'HashSums' isn't new functionality. It replaces SHA256Sums which is used for hashng files

Well, that's exactly the point of my question: why replace it?

bitfield avatar Aug 21 '22 16:08 bitfield

I guess I'm just confused - it "replaces" the function in the same way HashSum replaces SHA256Sum - it allows one to select a different hash algorithm for the task at hand.

Both SHA256Sum and SHA256Sums still exist - they are just single line functions to HashSum(crypto.SHA256) or HashSums(crypto.SHA256).

I did move SHA256Sums down in the code base to get all the hashing functionality a little closer together - maybe that's leading to the confusion?

wishdev avatar Aug 21 '22 16:08 wishdev

it allows one to select a different hash algorithm for the task at hand.

And in what kind of non-trivial script pipeline would this be necessary? In other words, what we're looking for now is an example of a program that we couldn't write without the existence of HashSum or HashSums. We can use that to see if the API makes sense, or if it needs tweaking a little.

bitfield avatar Aug 21 '22 17:08 bitfield

I also want to point out what an API concern with using crytpo.Hash that I laid out in https://github.com/bitfield/script/issues/113#issuecomment-1111470432, which essentially means that we can use traditional hashes but not necessarily newer hash functions with parameters like scrypt and others. I still think that using a generic hash like @wishdev has laid out is far more flexible and the use case function would be almost exactly like here https://github.com/bitfield/script/issues/113#issuecomment-1111470374 but with Hash(crypto.SHA512) instead of my SHA512Sums(). I can attest to regularly utilizing different hash functions depending on their availability more than necessarily a decision to select one.

terrorbyte avatar Aug 26 '22 19:08 terrorbyte