raster-vision Make Raster Vision easier to use as a library (and not just as a framework)

Problem

Raster Vision (RV) is a framework. To use it, users must configure the RV pipeline by defining a get_config() function. This has the following disadvantages:

Even the simplest examples of these get_config() functions tend to be dozens of lines long.
It is not obvious how to map user requirements to a suitable pipeline config without deep familiarity with RV internals.
It is harder to debug because one cannot individually examine the various components of the pipeline to ensure that they are working as intended.
Overriding RV default functionality requires creating a new RV project which is a non-trivial amount of effort.

This high barrier-to-entry is likely discouraging for new users.

Proposed solution

To remedy this, it is proposed that we refactor RV so that it is more easily usable as a library with multiple entry and exit points. This is in line with the general software engineering advice about writing libraries instead of frameworks as well as the trajectory of other similar frameworks.

A largely successful attempt to use RV (in its current state) as a library was made in the companion notebook to the Change Detection blog.

An example of what a future workflow might look like:

Tasks

To achieve this, we would need to complete the following tasks. Each of these should ideally be turned into its own issue.

Ensure that RV is installable and usable outside of the Docker container.
Ensure that classes can function without relying on global state (such as the global tmp_dir).
~Ensure that classes do not rely on Pydantic configs.~
Simplify the initialization of a GeoDataset as much as possible. Currently, it requires first creating a RasterSource, a LabelSource, and a Scene. Can this be simplified? Is Scene a useful abstraction?
Refactor Learner.
Make the Learner replaceable with custom training and prediction code.

Edit: Tasks are now being tracked here: #1460.

Apr 28 '22 09:04 AdeelH

I feel like RV can and should? be converted into a library, as indeed the configuration itself is a tough experience.

I'd like to suggest the almost complete removal of RV Pipelines, and the same usecases can be reimplementated i.e. via Kubeflow; <- in this case RV will be used as a library, and Kubeflow will be used to orchestrate inputs / outputs / chain different steps together.

The same can be probably achieved with Argo as well, depends on the needs and future requirements.

Apr 28 '22 15:04 pomadchin

A demo of running RV commands using Kubeflow graphs can be seen here: https://github.com/azavea/pipeline-playground/tree/main/kubeflow/rv

May 02 '22 15:05 lewfish

This would be a great enhancement. I'm interested in using rastervision in my research (esp. for spatial subsetting for train/test), but even getting the examples to run is quite complicated (esp. coming from R), and it's not clear how to integrate RV functionality with existing code

May 04 '22 01:05 cynthiahqy

It is harder to debug because one cannot individually examine the various components of the pipeline to ensure that they are working as intended.

Yeah, it reminds me of the old Tensorflow. Just define the computation graph and then let the system compile it and pray that it runs successfully.

May 05 '22 21:05 lewfish

Overall, this sounds great, and I like the idea of keeping the notebook use case in mind. However, I'm not sure I agree with the part about "Ensure that classes do not rely on Pydantic configs." Some classes depend on a lot of hierarchical configuration, and re-using the Config classes is an easy way to pass this configuration through the system. Otherwise you might have methods that take a very long list of arguments, and then just pass those arguments to other methods, which can be verbose and unwieldy. But, I don't have a very strong opinion -- it probably need to be judged on a case by case basis.

May 05 '22 21:05 lewfish

It seems like the predictor and evaluation functionality should also be usable a la carte and within a notebook environment.

Jun 15 '22 17:06 lewfish

It seems like the predictor and evaluation functionality should also be usable a la carte and within a notebook environment.

Yeah, prediction is next on my agenda. I'll share the refactoring plan here before jumping into it.

Jun 16 '22 11:06 AdeelH

I think a Learner.predict_dataset(dataset) method that returns a generator of predictions gets us most of the way there. The user will then have to just instantiate a dataset (either from a DataConfig or using custom code) and pass it to this method.

The predictions combined with windows can then be turned into Labels, possibly via a Labels.from_predictions(windows, preditions) method for convenience.

Jun 30 '22 13:06 AdeelH

I think a Learner.predict_dataset(dataset) method that return a generator of predictions gets us most of the way there. The user will then have to just instantiate a dataset (either from a DataConfig or using custom code) and pass it to this method.

The predictions combined with windows can then be turned into Labels, possibly via a Labels.from_predictions(windows, preditions) method for convenience.

Sounds good as long as there's an easy way to also save the labels to disk. You might need some other convenience method for instantiating the LabelStore.

Jun 30 '22 15:06 lewfish

raster-vision raster-vision copied to clipboard

Make Raster Vision easier to use as a library (and not just as a framework)

Problem

Proposed solution

Tasks

raster-vision
raster-vision copied to clipboard