clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Client-side hash calculation does not guarantee data consistency

Open kokamido opened this issue 1 year ago • 1 comments

Hi. I found that the artifact's hash is calculated on the client side (https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/binding/artifacts.py#L743), and the server does not validate it. It may lead to inconsistency for stored file and it's hash in the case of silent data corruption. Such a case may be caused by network problems or concurrent disk usage on the server side. Server-side hash validation for uploaded artifacts would be a nice solution that can prevent this kind of problems.

kokamido avatar Aug 11 '22 05:08 kokamido

Hi @kokamido

...and the server does not validate it.

The server cannot validate the hash, it just stores it, the reason is, the file itself can be uploaded to a 3rd party, for example object storage (e.g. S3 GCP etc.). The server has no way to access these files ...

bmartinn avatar Aug 19 '22 22:08 bmartinn