How to compute the Sum512 of a stream of data without fixed known length
Hello,
I’m curious to know if it’s possible to compute the hash of a data stream without knowing its length in advance, and to do this without storing the entire data in RAM. Ideally, the hash should be updated incrementally.
As a beginner, this might seem like a simple question. I understand that to compute the hash, the data needs to be divided into chunks of 1024 bytes.
To put it in simpler terms, I want to write a Hash class that has a method void HashByte(*byte). This method would take an arbitrary number of bytes as input and maintain a “partial hash” in memory, which is updated incrementally every time N bytes arrive from the stream.
Another method, byte[64] Close(void), would return the 512-bit hash as an array of 64 bytes, representing the entire received stream.
for example:
hash = New Hash()
hash.HashBytes(byte[] "This is the first array of byte.")
hash.HashBytes(byte[] " <A VERY LONG STRING>.") // Here can be added even few MiB of data
hash.HashBytes(byte[] "") //Nothing is added
hash.HashBytes(byte[] " This is the second")
hash.HashBytes(byte[] ".") // Just 1 byte is added
byte[64] result0 = hash.close()
byte[64] result1 = Sum512(byte[] "This is the first array of byte. <A VERY LONG STRING>. This is the second.")
result0 should be equal to result1
Is it possible and how can I do?
Many thanks
sure, here's how to do that in Go:
h := blake3.New(64, nil) // 512-bit output, no key
h.Write([]byte("This is the first array of byte."))
h.Write([]byte("<A VERY LONG STRING>"))
result := h.Sum(nil)
In Go, we typically use the io.Reader and io.Writer interfaces when working with lots of data. For example, if your data is stored in a file, you could do this:
f, _ := os.Open("path/to/file")
h := blake3.New(64, nil)
h.Write([]byte("This is the first array of byte."))
io.Copy(h, f) // stream the file contents into the hash
result := h.Sum(nil)
This works because io.Copy streams data from an io.Reader (in this case, f, the file) to an io.Writer (in this case, h, the hash).
Hope this helps!