clearml
clearml copied to clipboard
Client-side hash calculation does not guarantee data consistency
Hi. I found that the artifact's hash is calculated on the client side (https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/binding/artifacts.py#L743), and the server does not validate it. It may lead to inconsistency for stored file and it's hash in the case of silent data corruption. Such a case may be caused by network problems or concurrent disk usage on the server side. Server-side hash validation for uploaded artifacts would be a nice solution that can prevent this kind of problems.
Hi @kokamido
...and the server does not validate it.
The server cannot validate the hash, it just stores it, the reason is, the file itself can be uploaded to a 3rd party, for example object storage (e.g. S3 GCP etc.). The server has no way to access these files ...