Better compatibility for databricks
Feature
Solve compatibility issues with the Databricks platform. For example, zip file restrictions.
Pitch
Databricks have some unique restrictions which have caused compatibility issues with the slideflow. A more DB-compatible version of slideflow is beneficial to this user group.
Alternatives
Additional context
Start with solving zip file generation problems on Databricks.
Thanks - I've created the branch databricks for development. I added a commit that should address the DatasetFeatures.to_torch() issue you encountered previously. Let me know if that works.
There are a couple of other functions that save data as ZIP files, including:
Heatmap.save_npz()SlideMap.save()- MIL attention export during validation or evaluation
- Slide QC mask saving/loading
Would we need to extend functionality for all of these, as well?
Hi James, thanks, that was quick! I will try it now. For the other functions, yes, please address them if possible. I believe I will need to use some of these functions as well!
Hi James, I tried your fix. It took some time to run. I was still not able to run it through. Below is the error. It looks similar to previous ones but not exactly the same. Thanks!
features.to_torch(rootpath + '/imagenet/bag_directory/') Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- Traceback (most recent call last): File "
", line 1, in File "/databricks/driver/slideflow/slideflow/model/features.py", line 556, in to_torch tfrecord2idx.save_index( File "/databricks/driver/slideflow/slideflow/util/tfrecord2idx.py", line 70, in save_index np.savez(index_file, index_array) File "<array_function internals>", line 5, in savez File "/databricks/python/lib/python3.10/site-packages/numpy/lib/npyio.py", line 618, in savez _savez(file, args, kwds, False) File "/databricks/python/lib/python3.10/site-packages/numpy/lib/npyio.py", line 721, in _savez with zipf.open(fname, 'w', force_zip64=True) as fid: File "/usr/lib/python3.10/zipfile.py", line 1180, in close self._fileobj.seek(self._zipfile.start_dir) OSError: [Errno 95] Operation not supported
Hmm... just to confirm, did you set the environmental variable SF_ALLOW_ZIP=0? I'm not sure how this error could be encountered if that variable is set.
It's likely. I just started rerunning and will let you know tomorrow morning once it's finished. Thanks!
Hi James, it worked. I probably missed the environment variable. I'm going to train MIL models, attention-based MIL won't work yet at this time because of zip file issue, correct?
I've just added a possible solution for attention-based MIL - give it a try and let me know if it works!
Hi James, it worked like a charm! Although I did notice that there were a few necessary packages for the training were not included in the installation, for example, fastai.
Glad to hear it!
Re: dependencies - as you are aware, Slideflow is seeking to support a diverse set of deep learning tasks (segmentation, image generation, self-supervised learning, classification) and training paradigms. Some of these tasks have specific version requirements (e.g. StyleGAN requires PyTorch < 1.12) or dependencies (fastai for MIL; cellpose for cell segmentation), and we have an entirely separate backend for Tensorflow and PyTorch, each with their own separate dependencies.
Rather than requiring all users to install all dependencies, the approach we have taken is to limit the auto-installed dependencies to only what all users will use, and then users can install additional dependencies based on their needs. For example, this will install only the base requirements of slideflow:
pip install slideflow
This will install dependencies for cell segmentation:
pip install slideflow[cellpose]
This will install all of the PyTorch-associated dependencies, including FastAI:
pip install slideflow[torch]
and so on. The installation instructions at https://slideflow.dev/installation/ do note that PyTorch users should install with pip install slideflow[torch], so this should have installed the FastAI dependency, as well.
We're definitely open to hearing suggestions for alternative approaches. We could also expand the discussion of this in the installation instructions.
Got it, makes sense to make it need-based. I think it was also because I installed it from source so it might be a different experience if I use other methods like pip. I will definitely let you know if I have more thoughts about this. Thanks!
Hi @jamesdolezal, I encountered zip file issue again when running slide_map.save_umap('path') even after I defined the environment variable. Thought that you might have missed this one.
