archi
archi copied to clipboard
Store images in the *.archimate file as Base64 bytes
If we use images in canvases we have an ArchiveManager
that takes care of saving the images as binary files and then zipping the model.archimate
file and the image files into a zip archive file. This method was written 9 years ago because RAM was low and to keep the file portable.
An alternative method is to store the image data as Base64 encoded bytes as a set of Archi "features" in the *.archimate file.
The branch image-store
has the code to do this. So it works and is quite fast and not too memory intensive.
The feature image data are stored in the root model node so they can be re-referenced multiple times, saving on space and memory.
Do we want to do this? What are the advantages/disadvantages?
If we go this way we would need to do:
- Convert previous version archive files to the new format when loading
- Adapt coArchi to handling images in a different way
I really like this.
In fact, several times I wished I had a way in jArchi to update or add some images. My use-case was then to create a canvas containing an image, and a plantuml code as the image description. My script would then call plantuml, get the image and set it in the canvas.
I think that having the image data in a "feature" would make it easier to get/set.
The only reasons I did it this way 9 years ago were:
- To save on memory for 32-bit OS
- Speed
- To keep images and the model file together
(1) is not really valid now with 64-bit memory addresses and bigger RAM. (2) is no longer valid since we improved loading speeds (3) is still valid but is also fulfilled in this new format.
I like it too and it seems to work well. The back end code is way more simple now as well (no need for storing bytes and doing tricks when saving)
I'll go further with this and explore these issues:
- Convert previous version archive files to the new format when loading
- Adapt coArchi to handling images in a different way
Here's a really cool thing as well. The image bytes are stored like this:
<feature name="imageBytes_e2d5641007451cb8bed2a5f74c70c115279cbd5e"
value="iVBORw0KGgoAAAANSUhEUgAAA..."/>
That string added on to "imageBytes_" (name) is a SHA-1 hash of the image bytes (value). So we can avoid duplicate image bytes being added because we know from the name that it is already added.
Tested with several large images that took a total of 83Mb in file size. Quite fast loading and saving. I think as long as people don't put their entire photo album on a canvas it should work well.
More testing:
A 360Mb *.archimate
file containing several large images takes about 2 seconds to load and about 4 seconds to save. Not bad, as this is an extreme case.
I've done it. Conversion from the zip format to the new format when loading a model is also implemented. It seems to work well.
Now the next concern is converting coArchi's image handling. I've got a feeling that won't be so easy. @jbsarrodie Do you think there will be backward-compatibility issues?
Now the next concern is converting coArchi's image handling. I've got a feeling that won't be so easy. @jbsarrodie Do you think there will be backward-compatibility issues?
Thinking out loud:
- Current code shoud be updated to be failsafe (it would be good to have a version of coArchi which works with both versions of Archi, the one with historic image handling and the one with the new approach)
- coArchi is already able to store features, so it should be able to store images...
- but There might be some side effect on Git with such big strings...
- so maybe we should extract those features in self contained files under
images
folder when exporting, and add them back to model element as feature when loading
but There might be some side effect on Git with such big strings...
Doesn't seem to cause any problems. Git regards it as a binary file because of the long lines.
I've committed a first go at this in branch image-store
in the coArchi git repo (https://github.com/archimatetool/archi-modelrepository-plugin/commits/image-store).
This is what I've done so far:
-
GraficoModelExporter
does not save images to theimages
folder - an easy win. -
GraficoModelImporter#loadImages()
will check theimages
folder and if there are images present converts them to the new format
The only issue is merge - one has to choose "Theirs" if the model file has the images as features.
It would be nice to store in grafico format as "features" but maybe for backward-compat we should try and do this:
so maybe we should extract those features in self contained files under images folder when exporting, and add them back to model element as feature when loading
@jbsarrodie This is turning out to be extremely difficult. I can't find a way to do it that's backwards-compatible for coArchi.
There's a new branch in coArchi called image-store2.
With this method, the images are saved in the old way in the "images" folder and the model features with image references are removed when exporting to grafico.
The problem is that with the new ArchiveManager
we have to store image paths with a new key name like "imageBytes_123456789" and the old way was a path like "images/1234".
At this point I think my head is confused as to what's going on. ;-)
I think I've come up with a strategy that works with coArchi.
- Use the "images/" prefix on image path names and file suffix. For example:
<feature name="images/e2d5641007451cb8bed2a5f74c70c115279cbd5e.png"
value="iVBORw0KGgoAAAANSUhEUgAAA..."/>
-
When exporting to grafico format in coArchi, remove any features that start with "images/" from the root model. Save image data to image files in the "images" sub-folder.
-
Importing from grafico stays the same
There's a new branch in coArchi called
image-store2
.
Main branch in coArchi is now image-store
.
I've tested the coArchi changes and it all works in Archi 4.6 and Archi 4.7.
More testing and it seems to work nicely. The code base is simpler and clearer.
However - a model file saved in this new format will open in Archi 4.6 but the images will not show. I've changed the ModelVersion to "4.7" to warn users when opening a model. If they do, the features containing the image data will be preserved it's just that the Archi 4.6 user won't see them.
I really like this.
In fact, several times I wished I had a way in jArchi to update or add some images. My use-case was then to create a canvas containing an image, and a plantuml code as the image description. My script would then call plantuml, get the image and set it in the canvas.
I think that having the image data in a "feature" would make it easier to get/set.
@jbsarrodie would you mind sharing those scripts? I as well have a case where Archi and platuml need to be mixed for proper documentation.
Now that it works we should decide if it is actually a good thing.
There is a school of thought (and this included me when I first write the original code) that such large data blobs should not be stored in an XML file but referenced as an external file (as we do now) - see https://stackoverflow.com/questions/5232445/storing-image-in-xml
But now I'm not sure whether it's a good or bad thing.
On the other hand, HTML pages have embedded base64 image data...
On the other hand, HTML pages have embedded base64 image data...
But this is not a good practice as some benchmarks show that it is 5 times slower to load resources in base64 data-uri.
Now that it works we should decide if it is actually a good thing.
Good question. As we often say, if it ain't broken, don't fix it, so do we really need this? The question are:
- Does this solve bugs or limit potentials new bugs ?
- Does this improve performance ?
- Does this simplify other new features ? E.g. does this simplify extending jArchi API to add, update, remove images and use them in canvas, or is it similar ?
- Does this impact users ? At least a bit IMHO as this means upgrading the model version number which is very annoying with coArchi as people who simply open a model (or switch to an old branch) will potentially change the version and thus will be asked to commit their changes while they think there is none.
Depending on your answers to the first questions, maybe I would suggest to keep this for a latter version which will contains other changes in model structure. Maybe for Archi 5.0 in which Properties would be named Attributes, thus some major changes to jArchi (and maybe coArchi too)
Does this solve bugs or limit potentials new bugs ?
In a way, yes. Loading images is cleaner and less prone to error. Internally, the code is a great improvement.
Does this improve performance ?
I think it's about the same in terms of speed. But internally, saving and loading is simpler - just an xml file. As we have it now we have to create a temp file, a zip file stream, and write that and image bytes to files. It works OK in both cases.
Does this simplify other new features?
I've not thought about this in terms of jArchi, so I don't know. BTW - canvas support in jArchi is quite a big thing to implement, so not done yet.
Does this impact users ?
Model version number yes. Old archive zip file format is automatically converted.
I guess it boils down to archive format vs. single xml file format.
I guess it boils down to archive format vs. single xml file format.
Crazy idea: wouldn't it be possible to leverage EMF and see the model as a resource set which contains a primary resource (the model) which references the features from other resources from the same resource set (one resource per image). So we get the best of both worlds: each image have its own file (easier to manage for git), but we still use Archi's feature to store them and attache them to the model.
Of course this means some more work on coArchi, but this would be a very generic way to handle binary attachements (each attachement is base64 encoded but sits in its own resource file.
BTW, having a kind of generic notion of "attachement" in Archi could make sens for non image too. I once imagine to store some jArchi scripts inside features, and then have a "local" jArchi script that would simply load those feature's scripts. This would make it easy for people to share scripts with their models.
Crazy idea...
Would need to investigate this. Good idea, though. ;-)