pyimagej
pyimagej copied to clipboard
Integrate ij1-ij2 synchronization to image conversion
When we wrap numpy arrays to ImagePlus
instances we end up with a virtual ImagePlus
that does not automatically reflect changes to its pixel values.
We have the synchronize_ij1_to_ij2
helper method to work around this limitation but it's currently not called as part of our conversion methods.
In from_java
we should have a check that if our data is a virtual ImagePlus
, we synchronize it. This check should be defensive and work even if IJ1 is not present on the classpath.
from @ctrueden
@hinerm Talking more with @elevans, we think that the synchronize_ij1_to_ij2 call (which I'd support renaming to just sync or similar) needs to go at the end of run_script and/or run_macro. Putting it only in from_java does not solve the problem of the wrapped numpy array being out of sync after executing Java code on an ImagePlus, because from_java will not even be called afterward in the typical case.
see also #74
May I ask why this synchronization has to happen in pyimagej
instead of imagej-legacy
? Aren't we facing similar situations when converting/synchronizing between ImagePlus
and Dataset
even in a standalone Fiji, i.e. imagej-legacy
?
See also https://github.com/imagej/imagej-legacy/issues/231.
@imagejan Ideally, yes. But I am not sure exactly when we should trigger this synchronization.
To summarize the architecture for interested readers:
- You create a numpy array
arr
in Python. - You wrap that numpy image to a
Dataset
on the Java side by callingij.py.to_java(arr)
, which under the hood is wrapping anet.imglib2.python.ReferenceGuardingRandomAccessibleInterval
, so that accesses to the pixels in Java are directly backed by the Python-side memory. - The ImgLib2/ImageJ2 image object gets wrapped to an
ImagePlus
—specifically one of thenet.imglib2.img.display.imagej.ImageJVirtualStack*
classes—by imglib2-ij.
So now we have an original ImageJ ImagePlus
that can be used by original-ImageJ-style plugins. However, the virtual ImagePlus
pulls pixels on demand whenever the current slice is changed, into a copy of that plane. The core design of ImagePlus
is that the current plane is always accessible as a Java primitive array in memory, and plugins are free to change that array however they want. And when they are done changing it, there is no requirement to do any callback to notify that image pixels have changed. In ImageJ2, we historically worked around this by resyncing in response to updateAndDraw()
calls, but in some cases this was too slow, and I cannot recall for the current codebase when exactly we do such syncing.
As you might expect, many core plugins e.g. "Subtract Background" and "Gaussian Blur" will operate on the current slice's working pixels array. But exactly when should the resync back to the backing ImgLib2 image happen? The imglib2-ij library handles it by writing out the changed pixels whenever setPixels
is called, which happens under the hood when setSlice(int)
is called.
But what if a plugin doesn't ever call setSlice
on the ImagePlus
? Should we, after every plugin execution, use this
imp.getStack().setPixels(imp.getProcessor().getPixels(), imp.getCurrentSlice())
hack? It might create performance problems.
The workaround I talked with @elevans about yesterday was to call synchronize_ij1_to_ij2
on any ImagePlus
output parameter from ij.py.run_script
calls. But what about ij.py.run_plugin
? Which ImagePlus
objects are potentially getting touched by random plugins can't be known, in general. We can only do a best effort with heuristics.
Maybe these heuristics could achieve sufficient performance by hooking into a bunch of common ImagePlus
accessor methods, and if any of those are called, mark the image as maybe dirty. Then we can more aggressively invoke syncs after every plugin invocation, but limit the setPixels
calls only to those images that are "maybe dirty" and clear those flags.
This problem really needs a person to fully focus on it for one week or more. And unfortunately no one is going to do that before the end of this year.
One other (probably horrible) idea I had was to make some kind of LegacyAwareRAI
or similar thing on the ImgLib2 side, that is always linked to exactly one corresponding ImagePlus
as well as one corresponding backing RAI. And then when you ask pixels from this thing, it would always prefer to give them to you from the backing array of the ImagePlus
object's current slice, and only fall back to the RAI for pixels outside the current slice. Then if we take care to always use this RAI everywhere instead of the "raw" backing RAI, you wouldn't have any sync issues on the ImgLib2 side.
But this idea would still not solve the Python case, because for Python, we really do need the pixel changes to be synced back out to the numpy array. We could of course use this LegacyAwareRAI
from Python—we could even duck type it to be as array-like as possible—but ultimately using it would be a lot slower than using a raw contiguous numpy array, and it won't always meet requirements on the Python side.
Relatedly: I think this issue is misnamed, because calling synchronize_ij1_to_ij2
in from_java
also does not solve the case where someone:
- Creates a numpy array
- Wraps it to
ImagePlus
- Operates on that
ImagePlus
- Proceeds to use the numpy array assuming it is changed.
The above four-step workflow is what most of the relevant issues here are doing. And from_java
is not part of that workflow.
The simplest thing is probably to rename synchronize_ij1_to_ij2
to sync
and encourage people to call that whenever it seems like their data didn't get updated.
The simplest thing is probably to rename
synchronize_ij1_to_ij2
tosync
and encourage people to call that whenever it seems like their data didn't get updated.
Agreed, we should make this change.
Thinking more about this since our last meeting, I'm starting to like the idea of an auto_sync = True
mode where sync
is called after ij.py.run_script
and ij.py.run_macro
complete their tasks. If the performance hit is too large, then our users can decide to disable that and manually sync the data at the end of their pipeline (or where appropriate). How expensive is the sync
call in the grand scheme of things?
@elevans verify that #74 is fixed when completing this issue
The synchronization function was renamed to sync_image
, and is explained in the docs.