Pluto.jl icon indicating copy to clipboard operation
Pluto.jl copied to clipboard

Remove `Cell order` section in `.jl` file

Open PatrickHaecker opened this issue 7 months ago • 14 comments

I currently need to merge a lot of branches containing a Pluto notebook and it's a huge pain wasting several hours.

One of the advertising features is "Pluto notebook files are designed to work well with version management.".

I would not subscribe to this statement. Yes, they do work with version management, but not particularly well at the moment.

There are two reasons:

  1. The order of the cells seems to change a lot over time. When merging two versions of a notebook which are 6 months apart, more a less the whole file shows up as one large diff although one version only has a handful of cells added/changed and the rest is some seemingly random resorting done by Pluto over time. But it is nearly impossible with the usual tools to identify this "real" diff.
  2. Whenever you change something during merge, you need to be really cautious to always keep the cell itself in sync with the reference to the cell at the bottom of the document in the Cell order section.

It would be so much simpler if the cells were stored in the .jl file in the "real" order of how they appear in the document. Then, the cell order section could be completely removed if my understanding is correct. The dependency between the cells can be calculated when reevaluating the document. Whether a cell is collapsed or expanded can be stored in the cell itself.

PatrickHaecker avatar Apr 09 '25 08:04 PatrickHaecker

It would be so much simpler if the cells were stored in the .jl file in the "real" order of how they appear in the document.

If that would be the case, most Pluto notebooks would not be executable as a plain Julia script. IMO, it is one of Pluto's main features that you can execute them without knowing they are in fact Pluto notebooks.

I do agree, however, that Pluto tends to reorder cells more than necessary. If I want to git add such modifications, I like to modify the .jl file manually (using vim) to modify the "cells" in-place if I know that its position in the script does not need to change (i.e., the cell's dependencies did not change) and commit. Pluto never complained for me, and if Pluto did insist on changing the cell order in the script (open the notebook using Pluto, run (optional?), and save), I add another commit that only changes the cell order but not the cell modifications.

The reason behind Pluto reordering the cells may lie in the fact that there is usually more than one topological order of cell dependency graph. It would be nice if Pluto were able to select the one with the least deviation from the one last recorded by git. However, figuring out the "correct" reference order is far from trivial. A first shot could be some script pluto-reorder new.jl old.jl, that could eventually become part of some PlutoTools.jl.

jonas-schulze avatar May 27 '25 08:05 jonas-schulze

Maybe an option Pluto.run(store_cell_order=false) to disable storing the cell order for your session, and instead use the order in the file. It is not needed for reproducibility, only for running the file locally without Pluto (which is a pretty niche feature...)

fonsp avatar May 27 '25 08:05 fonsp

Both ideas sound worth exploring to me.

Pluto.run(store_cell_order=false) is (hopefully) logically simple, but is effectively a new data storage format which involves some variant handling. Being an option both formats would need to be supported in the future.

The stable ordering might be a larger change, but has the potential to be the "just works" solution for the first reason. I think we would not need to look at git, we would "just" need to make sure that whenever we save a Pluto file, it's saved with as little changes as possible compared to what has been loaded. So instead of calculating a new topological order from scratch, the smallest possible delta to the previous topological order would need to be determined. Alternatively, a total order should be defined, which has this property, that small code changes are guaranteed to result in small file changes. The second reason would remain, but this might be fixed with some guarantees, that unused references are deleted and non-existing references are created.

PatrickHaecker avatar May 27 '25 11:05 PatrickHaecker

As a tie breaker when creating a topological order, one could use the cell having the alphabetically smallest ~~hash~~ UUID. Assuming that the cell ~~hashes~~ UUIDs do not change (e.g., the user does not move the contents of one cell into a newly created one and deletes the old cell), this should keep changes to the topological order quite small, as long as the previous version/order has been created with the same tie breaker.

Edit: replaced all mentions of "hash" by "UUID".

jonas-schulze avatar May 27 '25 12:05 jonas-schulze

We already use visual cell order as a tie-breaker. That's probably best when you do stateful things like creating a file, then reading it. Currently, the run-all execution order from Pluto matches the saved order, I think that makes sense.

fonsp avatar May 27 '25 13:05 fonsp

As a tie breaker when creating a topological order, one could use the cell having the alphabetically smallest hash.

The problem with this approach is that you get a huge diff if you rename a variable which occurs in multiple cells.

PatrickHaecker avatar May 27 '25 13:05 PatrickHaecker

About a new file format: I see it more as a strict subset of the current file format. I think you can already remove the whole Cell order section in the .jl file and Pluto will load the notebook, with saved order as visual order. Not sure though, it might require the "Cell order" string to parse correctly. That logic is here:

https://github.com/fonsp/Pluto.jl/blob/44d733408f927b589a11599a71988dceaed82c7b/src/notebook/saving%20and%20loading.jl#L303

So it's just about a new way to write notebook files, but they already read in this format.

fonsp avatar May 27 '25 13:05 fonsp

We already use visual cell order as a tie-breaker. That's probably best when you do stateful things like creating a file, then reading it. Currently, the run-all execution order from Pluto matches the saved order, I think that makes sense.

Did this change during the last year? I ask because I don't think that I did too much reordering in the document and still got these huge diffs. Ok, it might be, that the dependencies changed due to the changes of types and variables, so there was no tie to begin with or at least no tie in quite a number of cases.

PatrickHaecker avatar May 27 '25 13:05 PatrickHaecker

Not sure though, it might require the "Cell order" string to parse correctly.

In order to use this in production, this would need to be a guaranteed property, however, so that future versions of Pluto will support reading and writing this strict subset. Other than that, this would solve my use case.

PatrickHaecker avatar May 27 '25 14:05 PatrickHaecker

As a tie breaker when creating a topological order, one could use the cell having the alphabetically smallest hash.

The problem with this approach is that you get a huge diff if you rename a variable which occurs in multiple cells.

The cell hash does not depend on the cell's content, IIRC. It should be created when the user presses the "+" button. By "cell hash", I refer to the identifier starting with bb8 in the comment above a cell's content:

# ╔═╡ bb8dedb4-3b05-11f0-1f9c-ad7714469b6e
x = 42

If I change that cell in the notebook and hit save, cat file.jl still yields the same hash:

# ╔═╡ bb8dedb4-3b05-11f0-1f9c-ad7714469b6e
y = 21

jonas-schulze avatar May 27 '25 14:05 jonas-schulze

I don't think that I did too much reordering in the document and still got these huge diffs. Ok, it might be, that the dependencies changed due to the changes of types and variables, so there was no tie to begin with or at least no tie in quite a number of cases.

I also did not reorder most of the cells that got moved anyway. But you may be right that the cells may not have been involved in a tie in the first place. 🤔

jonas-schulze avatar May 27 '25 14:05 jonas-schulze

By "cell hash", I refer to the identifier starting with bb8 in the comment above a cell's content

Ah, thanks for the clarification. I think strictly speaking this is not a hash, but a UUID. But the important thing is that this should indeed mostly work (besides some corner cases where the user copies some content into a new cell and deletes the old one).

PatrickHaecker avatar May 27 '25 15:05 PatrickHaecker

Let's start by making sure that Pluto can parse notebook files without a Cell Order, and add a test for that. @PatrickHaecker do you want to take a shot at this?

While working on this, I recommend commenting out all tests in runtests.jl that you don't need. (Or find a way to make https://github.com/theogf/TestPicker.jl work with Pluto 😁)

fonsp avatar Jun 12 '25 09:06 fonsp

Ok, sounds like a plan! However, I probably won't find the time to get familiar with the code base and implement it during the next weeks. So if someone finds time earlier, please go ahead. Otherwise, I hope that I can tackle it when I find the time.

PatrickHaecker avatar Jun 12 '25 11:06 PatrickHaecker

I really want this feature, but I do worry that Pluto.run(store_cell_order=false) is not a correct approach. This setting would affect the whole Pluto browser application, meaning that the setting applies to all notebooks you open. I think it should act on a per-notebook level, preferably explicit to the user, and visible within the running window. I think there are two actions involved then:

  1. Reordering the cells so they appear (visually) in an executable order, preferably not deviating from the current visual order if not necessary
  2. Removing the cell-order from the bottom and saving the file.

I suspect that it is inevitable that if you continue to edit the file, the execution order and visual order will deviate again, but as long as you can bring them back in line, this is OK.

For the purpose of the OP, there might have to be some function to be called from outside of Pluto to do this action automatically on a notebook (or on.(notebooks) ;-) man I love Julia)

KeithWM avatar Jun 17 '25 09:06 KeithWM

The first step should be to implement the two actions you have listed, @KeithWM. In the second step we need to see what the user interface should be. I could also imagine, that the user can set the mode in the document itself and that there are probably multiple ways to activate this mode, but I propose to have this discussion only when we have the implementation.

PatrickHaecker avatar Jun 17 '25 12:06 PatrickHaecker

find a way to make https://github.com/theogf/TestPicker.jl work with Pluto

Should happen at some point: https://github.com/theogf/TestPicker.jl/issues/54

ederag avatar Jun 27 '25 11:06 ederag

Note: I made #3279 but got stuck 🙈

fonsp avatar Jun 27 '25 13:06 fonsp

I would like to factor out the parts from that PR that make the notebook parsing more stable though! That's useful to already release

fonsp avatar Jun 27 '25 13:06 fonsp

Factored out in #3285

fonsp avatar Jul 01 '25 12:07 fonsp