Robbe Sneyders
Robbe Sneyders
But why would we want to execute the rest of the pipeline if a component has failed? Or does it also not execute if the component succeeds?
The major clouds also offer ARM machines, especially Amazon it seems. Not sure how popular those are, but those would benefit from multi-arch images as well.
Since this might have an impact on image size as well, let's take https://github.com/ml6team/fondant/issues/573 into account and maybe tackle them together.
Each component streams the data from and to either local disk or remote storage and passes a reference to the stored data to the next component. This "passing by reference"...
> What we usually do in img2dataset and video2dataset is to do all intermediary processing in memory, read the input from remote storage with fsspec and write the output to...
I think there's benefits of having them within a single component or split across components. - If you combine them in a component, data only needs to be read &...
Since this is mostly focused on iterative development, we could also generate a notebook with example data. We could use some [notebook magics](https://stackoverflow.com/questions/63570506/jupyter-notebook-run-cell-and-save-as-py-file) so the user can actually develop their...
We also saw that pulling it once locally can take a long time, especially if you need to pull each image for a pipeline. Some images are >3GB.
Building base Fondant images might make sense as well since installing Fondant adds 377MB to the docker image due to our dependencies. We should investigate if the Fondant install layer...
We might need to use `dask.distributed` for this, see #395