notebook icon indicating copy to clipboard operation
notebook copied to clipboard

Cell dependency graph

Open nvdv opened this issue 9 years ago • 29 comments
trafficstars

At present all Notebook cells are executed linearly:

Cell 1
   |
Cell 2
   |
Cell 3

but sometimes there's no need to calculate Cell 2 in order to get result from Cell 3 and calculating Cell 2 might be time-consuming. Setting cell dependency graph somehow would resolve this issue.

nvdv avatar Mar 05 '16 14:03 nvdv

Have a look at ipycache if you have long-running cells that you don't always want to re-run. I don't think we want to get into defining a DAG of cells.

takluyver avatar Mar 05 '16 15:03 takluyver

There is a long thread we had a few years[*] ago about that on the mailing list.

[*] OMG I'm old now.

Carreau avatar Mar 07 '16 17:03 Carreau

@nvdv : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Has this issue been resolved to your satisfaction and can it be closed? thanks!

JamiesHQ avatar Apr 27 '17 01:04 JamiesHQ

It is feature request. I am not sure it was implemented, but its up to you to close it if you think its out of scope.

On Apr 27, 2017 04:05, "JamieW" [email protected] wrote:

@nvdv https://github.com/nvdv : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Has this issue been resolved to your satisfaction and can it be closed? thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1175#issuecomment-297582787, or mute the thread https://github.com/notifications/unsubscribe-auth/AAtf16APooTt3sBAP6TLQPtctfkAultEks5rz-nXgaJpZM4HqAPF .

nvdv avatar Apr 27 '17 05:04 nvdv

The long thread discussing this, linked above by @Carreau , is unreachable for me. So apologies if I'm rehashing things discussed there.

I certainly agree managing a DAG of cells is not desirable. But it would be cool if there was a built-in cell magic for stating cells to be automatically run first before running the current cell. Naively, this doesn't seem to be too burdensome a feature to implement, but I'm mostly a Jupyter notebook user, not developer, so I could be wrong. Does there exist any such cell magic, or a cell magic that could be used for this purpose?

adam-m-jcbs avatar Oct 03 '18 16:10 adam-m-jcbs

For future reference: the long thread was moved.

mxxun avatar Oct 17 '18 12:10 mxxun

Conversely, while a dependency graph might tell you you don't need to evaluate/re-evaluate cell B just because A changed, it might also tell you that you're going to have a bad time trying to evaluate C if C depends on A.

In accordance with https://jupyter-notebook.readthedocs.io/en/stable/security.html , if someone tried to execute a cell that depended on another, I wonder if it would make sense to do so automatically?

At a minimum, it might be helpful to have some visual feedback to indicate that the cell isn't runnable until some particular cell above satisfies its dependencies.

nickurak avatar Nov 22 '18 22:11 nickurak

@takluyver, is there any reason for a DAG of cells to be out of question? Visualising cells in a graph would certainly allow both cell dependency to become clearer as well as improve story telling capabilities, since non-linear (branching) stories are hard to tell within today's notebooks.

For a simple concrete example: imagine a notebook to evaluate three real estate expansion plans for a given city. The first node of cells loads the current real estate data and describes the current state of affairs. From there, you get three branches, each of them following similar logic but following different scenario premisses and arriving to comparable (but different) end results.

Today, this analysis could be done using a chapter for each scenario, but that still requires rolling up and down to compare, maybe unclear settings of which cell to run before scenario A, maybe (accidentally) re-running scenario A before B (run all is sooo easy to click on), etc.

pedrovgp avatar Nov 22 '19 14:11 pedrovgp

I think using a magic (or cell metadata) to explicitly define dependencies for a DAG of cells is a very interesting idea. I think automatically coming up with the DAG on the front end is probably prohibitively hard, given that we have a number of kernels of different languages. There was some work from a CalPoly group of students on a kernel that would keep track of a DAG, IIRC, somewhat like ObservableHQ.

jasongrout avatar Nov 22 '19 16:11 jasongrout

Because it's been a year, and this idea has been bouncing around my head a little -- here's a sketch of a thought in this area:

I'd be really interested in a world where the cells run in actual scopes, and cells were more explict about what they were pulling in from each other. This might be reasonably easy in python, but maybe tricky in different languages.

label_cell("utility")
def func_that_makes_a_df():
   <code>
<Some markdown explaining that function>
label_cell("get_pf")
from cell("utilty") import func that_makes_a_df()
df = func_that_makes_a_df()
<Some markdown that talk about a dataframe>
from cell("get_pf") import df as plotttable_df
import plotly

plotly.plot_something(plottable_df)

Making the only things that are shared between cells super-explicit might help:

  • reduce all kinds of unexpected behavior and unexpected side-effects of scope mixing
  • allow Jupyter to reason about the dependencies
  • give good errors when the dependencies are missing
  • automatically execute cells as they're needed.

I haven't really thought at all about what this might look like outside of the Python world.

nickurak avatar Nov 22 '19 22:11 nickurak

In that world, attempting to refer to func_that_makes_a_df in a cell that isn't explicitly importing it from another cell would, for example, fail, with a NameError: name 'func_that_makes_a_df' is not defined exception.

nickurak avatar Nov 22 '19 22:11 nickurak

@nickurak , I can see other use cases for that, but the use case you've described could be solved establishing cell dependency and splitting code in different cells accordingly. That would be a more generic approach as well, since it could apply to other languages.

Your example would be something like:

  • Label cell 1 as "utility"
  • Label cell 2 as "get_pf"
  • Add "depends on 'utility'" to cell "get_pf"
  • Add "depends on 'get_pf'" to cell 3 (which plots something)

If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.

pedrovgp avatar Nov 24 '19 21:11 pedrovgp

I have worked on a (quick and dirty) visual proposition of how to use cell dependencies to facilitate story telling and organize notebook flows. It probably makes more sense in JupyterLab project, but anyway, this is what I envision: https://docs.google.com/presentation/d/1nWAjvuCZb4MEu9SiTy-QWfMWBThpDpZFnuKNp1S_fHs/edit?usp=sharing

Any comments are appreciated.

pedrovgp avatar Nov 25 '19 21:11 pedrovgp

If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.

A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?

toobaz avatar Nov 25 '19 21:11 toobaz

Seems like it is going to be a part of JupyterLab Core [https://github.com/jupyterlab/jupyterlab-celltags]

pedrovgp avatar Nov 26 '19 14:11 pedrovgp

A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?

Yes. In the Jupyter official notebook format, a cell can have an optional unique name in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata

jasongrout avatar Nov 26 '19 16:11 jasongrout

Yes. In the Jupyter official notebook format, a cell can have an optional unique name in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata

Cool! And is this already exposed somewhere?

toobaz avatar Nov 26 '19 16:11 toobaz

Cool! And is this already exposed somewhere?

It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key. Jupyter notebook and JupyterLab, for example, expose an interface for writing to the cell metadata.

jasongrout avatar Nov 26 '19 16:11 jasongrout

(To be clear, as with any metadata, it is optional and up to the writer to set this value. It is not set by default in JupyterLab, though it may be set in the notebook by default to some sort of UUID).

jasongrout avatar Nov 26 '19 16:11 jasongrout

It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key.

Yes, sorry, my question was misleading. I should have asked: is there already some UI for allowing the user to see/change this?

toobaz avatar Nov 26 '19 20:11 toobaz

Yes (though it's just a json editor). In JupyterLab, it's the wrench icon in the left sidebar. In classic notebook, it's the View > Cell Toolbar > Edit Metadata.

jasongrout avatar Nov 26 '19 21:11 jasongrout

In case that has not been posted already, please see also https://github.com/dataflownb and https://github.com/stitchfix/nodebook

Carreau avatar Nov 26 '19 21:11 Carreau

Both of those got talks at JupyterCon in 2018 so should be somewhere on Youtube.

Carreau avatar Nov 26 '19 21:11 Carreau

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/dag-based-notebooks/11173/2

meeseeksmachine avatar Oct 08 '21 13:10 meeseeksmachine

https://observablehq.com/ uses a DAG and I would love to see a JupyterLab extension providing similar features:

https://observablehq.com/@observablehq/how-observable-runs

Edit

Moved overview of projects to jupyterlab: https://discourse.jupyter.org/t/dag-based-notebooks/11173/4

stefaneidelloth avatar Oct 13 '21 12:10 stefaneidelloth

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-to-get-output-model-for-a-given-cell-in-a-jupyterlab-extension/11342/1

meeseeksmachine avatar Oct 21 '21 12:10 meeseeksmachine

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/dag-based-notebooks/11173/4

meeseeksmachine avatar Oct 22 '21 08:10 meeseeksmachine

Also see https://marimo.io/ .

jondo avatar Oct 14 '24 12:10 jondo

It's surprising that no one mentioned https://github.com/ipyflow/ipyflow.

krassowski avatar Oct 14 '24 13:10 krassowski