core-bioimage-io-python icon indicating copy to clipboard operation
core-bioimage-io-python copied to clipboard

Check the integrity of weights files and disable cache

Open oeway opened this issue 2 years ago • 2 comments

For some reason, some weights files get corrupted after downloading, and it get saved in the cache so it always crash no matter how many times I download it. I spent quite some time to figure it out. To avoid this type of issues I would propose the following:

  1. Disable cache by default
  2. Check the integrity of the the sha256 value of the weight file after downloading
  3. Provide an additional argument to allow run bioimageio validate --check-weights <MODEL_RDF>.

Another caveat about the current way of disabling cache is that we need to set the env before importing the bioimageio.core:

import os
os.environ["BIOIMAGEIO_USE_CACHE"] = "no"

# The import should be set after the import
import bioimageio.core

This works by setting env outside the script but not obvious for the users who want to set it inside the script (and also make the script a bit hacky since we cannot place all the imports to the top of the script).

oeway avatar May 28 '22 12:05 oeway

I haven't seen this behaviour before, could you link to a model where that happens? Is this maybe due to some timeouts on the server?

  • Disable cache by default

That's a pretty big change and will change overall usage pattern quite a bit. In general caching is definitely desirable, otherwise the library will download data over and over again. So I am against disabling caching by default, I think it's the desired way of working in most use-cases.

  • Check the integrity of the the sha256 value of the weight file after downloading

  • Provide an additional argument to allow run bioimageio validate --check-weights <MODEL_RDF>.

That's a good point, we should def. use the hash to check that the weights are downloaded correctly.

constantinpape avatar May 29 '22 08:05 constantinpape

Another caveat about the current way of disabling cache is that we need to set the env before importing the bioimageio.core:

import os
os.environ["BIOIMAGEIO_USE_CACHE"] = "no"

# The import should be set after the import
import bioimageio.core

Yes, that's how environment variables work and is a general pattern in python. We can add a section in the README about this.

constantinpape avatar May 29 '22 08:05 constantinpape