pyimagej Integrate ij1-ij2 synchronization to image conversion

When we wrap numpy arrays to ImagePlus instances we end up with a virtual ImagePlus that does not automatically reflect changes to its pixel values.

We have the synchronize_ij1_to_ij2 helper method to work around this limitation but it's currently not called as part of our conversion methods.

In from_java we should have a check that if our data is a virtual ImagePlus, we synchronize it. This check should be defensive and work even if IJ1 is not present on the classpath.

Dec 07 '21 16:12 hinerm

from @ctrueden

@hinerm Talking more with @elevans, we think that the synchronize_ij1_to_ij2 call (which I'd support renaming to just sync or similar) needs to go at the end of run_script and/or run_macro. Putting it only in from_java does not solve the problem of the wrapped numpy array being out of sync after executing Java code on an ImagePlus, because from_java will not even be called afterward in the typical case.

Dec 07 '21 16:12 hinerm

see also #74

Dec 07 '21 16:12 hinerm

May I ask why this synchronization has to happen in pyimagej instead of imagej-legacy? Aren't we facing similar situations when converting/synchronizing between ImagePlus and Dataset even in a standalone Fiji, i.e. imagej-legacy? See also https://github.com/imagej/imagej-legacy/issues/231.

Dec 08 '21 08:12 imagejan

@imagejan Ideally, yes. But I am not sure exactly when we should trigger this synchronization.

To summarize the architecture for interested readers:

You create a numpy array arr in Python.
You wrap that numpy image to a Dataset on the Java side by calling ij.py.to_java(arr), which under the hood is wrapping a net.imglib2.python.ReferenceGuardingRandomAccessibleInterval, so that accesses to the pixels in Java are directly backed by the Python-side memory.
The ImgLib2/ImageJ2 image object gets wrapped to an ImagePlus—specifically one of the net.imglib2.img.display.imagej.ImageJVirtualStack* classes—by imglib2-ij.

So now we have an original ImageJ ImagePlus that can be used by original-ImageJ-style plugins. However, the virtual ImagePlus pulls pixels on demand whenever the current slice is changed, into a copy of that plane. The core design of ImagePlus is that the current plane is always accessible as a Java primitive array in memory, and plugins are free to change that array however they want. And when they are done changing it, there is no requirement to do any callback to notify that image pixels have changed. In ImageJ2, we historically worked around this by resyncing in response to updateAndDraw() calls, but in some cases this was too slow, and I cannot recall for the current codebase when exactly we do such syncing.

As you might expect, many core plugins e.g. "Subtract Background" and "Gaussian Blur" will operate on the current slice's working pixels array. But exactly when should the resync back to the backing ImgLib2 image happen? The imglib2-ij library handles it by writing out the changed pixels whenever setPixels is called, which happens under the hood when setSlice(int) is called.

But what if a plugin doesn't ever call setSlice on the ImagePlus? Should we, after every plugin execution, use this imp.getStack().setPixels(imp.getProcessor().getPixels(), imp.getCurrentSlice()) hack? It might create performance problems.

The workaround I talked with @elevans about yesterday was to call synchronize_ij1_to_ij2 on any ImagePlus output parameter from ij.py.run_script calls. But what about ij.py.run_plugin? Which ImagePlus objects are potentially getting touched by random plugins can't be known, in general. We can only do a best effort with heuristics.

Maybe these heuristics could achieve sufficient performance by hooking into a bunch of common ImagePlus accessor methods, and if any of those are called, mark the image as maybe dirty. Then we can more aggressively invoke syncs after every plugin invocation, but limit the setPixels calls only to those images that are "maybe dirty" and clear those flags.

This problem really needs a person to fully focus on it for one week or more. And unfortunately no one is going to do that before the end of this year.

Dec 08 '21 18:12 ctrueden

One other (probably horrible) idea I had was to make some kind of LegacyAwareRAI or similar thing on the ImgLib2 side, that is always linked to exactly one corresponding ImagePlus as well as one corresponding backing RAI. And then when you ask pixels from this thing, it would always prefer to give them to you from the backing array of the ImagePlus object's current slice, and only fall back to the RAI for pixels outside the current slice. Then if we take care to always use this RAI everywhere instead of the "raw" backing RAI, you wouldn't have any sync issues on the ImgLib2 side.

But this idea would still not solve the Python case, because for Python, we really do need the pixel changes to be synced back out to the numpy array. We could of course use this LegacyAwareRAI from Python—we could even duck type it to be as array-like as possible—but ultimately using it would be a lot slower than using a raw contiguous numpy array, and it won't always meet requirements on the Python side.

Dec 08 '21 18:12 ctrueden

Relatedly: I think this issue is misnamed, because calling synchronize_ij1_to_ij2 in from_java also does not solve the case where someone:

Creates a numpy array
Wraps it to ImagePlus
Operates on that ImagePlus
Proceeds to use the numpy array assuming it is changed.

The above four-step workflow is what most of the relevant issues here are doing. And from_java is not part of that workflow.

The simplest thing is probably to rename synchronize_ij1_to_ij2 to sync and encourage people to call that whenever it seems like their data didn't get updated.

Dec 08 '21 18:12 ctrueden

The simplest thing is probably to rename synchronize_ij1_to_ij2 to sync and encourage people to call that whenever it seems like their data didn't get updated.

Agreed, we should make this change.

Thinking more about this since our last meeting, I'm starting to like the idea of an auto_sync = True mode where sync is called after ij.py.run_script and ij.py.run_macrocomplete their tasks. If the performance hit is too large, then our users can decide to disable that and manually sync the data at the end of their pipeline (or where appropriate). How expensive is the sync call in the grand scheme of things?

Dec 08 '21 19:12 elevans

@elevans verify that #74 is fixed when completing this issue

Feb 08 '22 15:02 hinerm

The synchronization function was renamed to sync_image, and is explained in the docs.

Jun 23 '23 16:06 ctrueden

pyimagej pyimagej copied to clipboard

Integrate ij1-ij2 synchronization to image conversion

pyimagej
pyimagej copied to clipboard