vizier-scala icon indicating copy to clipboard operation
vizier-scala copied to clipboard

Stop dataset message hydration from blocking forward cell progress

Open okennedy opened this issue 2 years ago • 0 comments

What pain point is this feature intended to address? Please describe. Several cells, most notably the SQL cell and most Mimir lenses, are designed first to generate a dataset and then display it as a message. While the update is relatively cheap (it just defines a new dataframe constructor), generating the preview view of the dataset gets significantly more pricy. Unfortunately, the way dependencies are tracked, subsequent cells can't be executed until the expensive part finishes too. A notebook that could finish almost instantaneously can instead take several minutes.

Describe the solution you'd like Provide a way to signal to ExecutionContext (e.g., ExecutionContext.noMoreArtifacts()) that no further artifacts will be generated and that any subsequent dependencies are free to proceed. ExecutionContext can then crash the cell if it tries to output an artifact.

Describe alternatives you've considered One alternative option would be to hydrate dataset messages lazily. An unhydrated dataset message could always be populated by querying the database (this query usually happens anyway since the workflow generally loads more than just the preview rows, but users will see spinners). The challenge is actually doing the hydration, which requires database access and so not something that we should be doing lazily when the artifact is accessed. We might be able to spin up a background worker to manage hydration...

okennedy avatar Oct 15 '22 20:10 okennedy