coralnet icon indicating copy to clipboard operation
coralnet copied to clipboard

Uploading large volume of images

Open Jordan-Pierce opened this issue 1 year ago • 3 comments

Hey @StephenChan,

For MIR we're looking at using CoralNet, both for the model, but also to have the data/annotations/model publicly available. We have a fairly large amount of higher resolution images (15mb). Thoughts on being able to bulk upload?

Jordan

Jordan-Pierce avatar Oct 03 '23 18:10 Jordan-Pierce

Hi Jordan,

15 MB per image sounds fine to me. About how many images though?

We do have a 8000 x 8000 resolution limit coded in, so do check if that could be an issue. I've been considering raising that limit slightly, or maintaining the total-pixel-count limit but relaxing the individual-dimension limits.

StephenChan avatar Oct 05 '23 00:10 StephenChan

I agree, 15mb isn't an issue, we already have about 20K images up on our source, all of which are the same resolution. But I notice that when trying to upload another batch (containing 2.7k images), after finishing checking the images, there's a failed upload error. I guess the alternative would be to upload them in smaller batches, but in total for this project, we're looking at more than 1M.

Jordan-Pierce avatar Oct 05 '23 12:10 Jordan-Pierce

Honestly, I do expect the upload form to have some failures when uploading thousands, so you'll want to double check which images actually went through. Note that one image failing shouldn't affect other images from getting uploaded successfully. So one the upload form finishes trying every image, you can reload the page and retry the images that failed. If they keep failing though, let me know. Another consideration is, I guess the upload page itself could get slower if you give it more images in one go, so smaller batches could make it a little snappier.

More than 1M! Okay, my thoughts on that:

  • CoralNet's total image storage right now is around 3 million images and 25 TB. So your current project alone could bring that to 4+ million and 40+ TB. Which I'd say is still within our budget, but certainly a pretty decent influence on our infrastructure size, and likely at the point where we'd hope for financial backing from your organization...

  • The biggest sources we've had (in terms of image count) have been around 100k images in a single source. That tends to make the source somewhat slow to browse already, mainly the Browse Images/Patches and Metadata pages. At sizes like that, we've also seen a few instances of classifiers failing to complete training. Both of these situations represent improvements we want to make, but until we do, I would not be optimistic about the performance past 200k or so in a single source. If I had to throw out a soft limit I would personally be comfortable with right now (in terms of "it might still break, but not too crazily"), I'd say 150k.

  • One project I'm currently getting funding for involves making our vision backend component, PySpacer, more usable without being tied to CoralNet's infrastructure. That's the context for this PR from last week, for example. So when this gets fleshed out in the near future (it most likely will, since the funding has a time limit!), an org with a really large volume of data could potentially use PySpacer to train and run a classifier within their own infrastructure, rather than having to upload all that data to CoralNet. This path would obviously need more logistics on your end, including setting up your own public repository of images and annotations, but it's at least something for your consideration.

StephenChan avatar Oct 07 '23 03:10 StephenChan

Thanks @StephenChan, will close this as answered, but real quick: is there a hard limit on the size of the images uploaded to CoralNet (dimensions, height / width)? I couldn't seem to find anywhere that specifies a limit.

Jordan-Pierce avatar Mar 11 '24 19:03 Jordan-Pierce