raster-vision icon indicating copy to clipboard operation
raster-vision copied to clipboard

Make Raster Vision easier to use as a library (and not just as a framework)

Open AdeelH opened this issue 2 years ago • 9 comments

Problem

Raster Vision (RV) is a framework. To use it, users must configure the RV pipeline by defining a get_config() function. This has the following disadvantages:

  • Even the simplest examples of these get_config() functions tend to be dozens of lines long.
  • It is not obvious how to map user requirements to a suitable pipeline config without deep familiarity with RV internals.
  • It is harder to debug because one cannot individually examine the various components of the pipeline to ensure that they are working as intended.
  • Overriding RV default functionality requires creating a new RV project which is a non-trivial amount of effort.

This high barrier-to-entry is likely discouraging for new users.

Proposed solution

To remedy this, it is proposed that we refactor RV so that it is more easily usable as a library with multiple entry and exit points. This is in line with the general software engineering advice about writing libraries instead of frameworks as well as the trajectory of other similar frameworks.

A largely successful attempt to use RV (in its current state) as a library was made in the companion notebook to the Change Detection blog.

An example of what a future workflow might look like: image

Tasks

To achieve this, we would need to complete the following tasks. Each of these should ideally be turned into its own issue.

  • Ensure that RV is installable and usable outside of the Docker container.
  • Ensure that classes can function without relying on global state (such as the global tmp_dir).
  • ~Ensure that classes do not rely on Pydantic configs.~
  • Simplify the initialization of a GeoDataset as much as possible. Currently, it requires first creating a RasterSource, a LabelSource, and a Scene. Can this be simplified? Is Scene a useful abstraction?
  • Refactor Learner.
  • Make the Learner replaceable with custom training and prediction code.

Edit: Tasks are now being tracked here: #1460.

AdeelH avatar Apr 28 '22 09:04 AdeelH

I feel like RV can and should? be converted into a library, as indeed the configuration itself is a tough experience.

I'd like to suggest the almost complete removal of RV Pipelines, and the same usecases can be reimplementated i.e. via Kubeflow; <- in this case RV will be used as a library, and Kubeflow will be used to orchestrate inputs / outputs / chain different steps together.

The same can be probably achieved with Argo as well, depends on the needs and future requirements.

pomadchin avatar Apr 28 '22 15:04 pomadchin

A demo of running RV commands using Kubeflow graphs can be seen here: https://github.com/azavea/pipeline-playground/tree/main/kubeflow/rv

lewfish avatar May 02 '22 15:05 lewfish

This would be a great enhancement. I'm interested in using rastervision in my research (esp. for spatial subsetting for train/test), but even getting the examples to run is quite complicated (esp. coming from R), and it's not clear how to integrate RV functionality with existing code

cynthiahqy avatar May 04 '22 01:05 cynthiahqy

  • It is harder to debug because one cannot individually examine the various components of the pipeline to ensure that they are working as intended.

Yeah, it reminds me of the old Tensorflow. Just define the computation graph and then let the system compile it and pray that it runs successfully.

lewfish avatar May 05 '22 21:05 lewfish

Overall, this sounds great, and I like the idea of keeping the notebook use case in mind. However, I'm not sure I agree with the part about "Ensure that classes do not rely on Pydantic configs." Some classes depend on a lot of hierarchical configuration, and re-using the Config classes is an easy way to pass this configuration through the system. Otherwise you might have methods that take a very long list of arguments, and then just pass those arguments to other methods, which can be verbose and unwieldy. But, I don't have a very strong opinion -- it probably need to be judged on a case by case basis.

lewfish avatar May 05 '22 21:05 lewfish

It seems like the predictor and evaluation functionality should also be usable a la carte and within a notebook environment.

lewfish avatar Jun 15 '22 17:06 lewfish

It seems like the predictor and evaluation functionality should also be usable a la carte and within a notebook environment.

Yeah, prediction is next on my agenda. I'll share the refactoring plan here before jumping into it.

AdeelH avatar Jun 16 '22 11:06 AdeelH

I think a Learner.predict_dataset(dataset) method that returns a generator of predictions gets us most of the way there. The user will then have to just instantiate a dataset (either from a DataConfig or using custom code) and pass it to this method.

The predictions combined with windows can then be turned into Labels, possibly via a Labels.from_predictions(windows, preditions) method for convenience.

AdeelH avatar Jun 30 '22 13:06 AdeelH

I think a Learner.predict_dataset(dataset) method that return a generator of predictions gets us most of the way there. The user will then have to just instantiate a dataset (either from a DataConfig or using custom code) and pass it to this method.

The predictions combined with windows can then be turned into Labels, possibly via a Labels.from_predictions(windows, preditions) method for convenience.

Sounds good as long as there's an easy way to also save the labels to disk. You might need some other convenience method for instantiating the LabelStore.

lewfish avatar Jun 30 '22 15:06 lewfish