WireViz Embed images in SVG output

[Update 2020-11-16]

This PR has evolved from simply fixing how relative image paths are resolved, to embedding any images in the SVG file by default, eliminating the original issue.

Previously, paths to images were resolved relatively to the specified output file instead of the input file.

This produces errors when calling WireViz with the -o flag, creating output files in a different location from the input.

This PR [partially, see comment below] fixes the problem.

Additionally, it eliminates the need for passing a gv_dir variable to the Image class, resolving paths beforehand and passing the absolute path to the Image class.

Oct 24 '20 19:10 17o2

It seems GraphViz only links to the image in the SVG output. This might be problematic if the output is stored in a different directory from the input, or moved afterwards. Further investigation is required.

Oct 24 '20 20:10 17o2

@kvid I'd be interesting in your input on this topic. The solution would be complete if it wasn't for the fact that the SVG, and by extension, the HTML output, depend on having a working link to the original image file.

The .gv file is also affected, but this should be less of a problem for the average user, since this is just an intermediate file.

It seems that GraphViz does not offer a way to embed images?

Oct 24 '20 20:10 17o2

Here are some new discoveries that might fix the above mentioned problem with embedded images.

HTML <IMG> tags allow embedding base64-encoded images within the SRC attribute, avoiding the need to link to external files: example 1; example 2 (encoded by me using base64-image.de)

GraphViz does not seem to support this. A quick test (using example 1 above) reveals that

attribs['image']['src'] = 'data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=='

produces an error:

graphviz.backend.CalledProcessError: Command '['dot', '-Tpng', '-O', 'ex08']' returned non-zero exit status 1. [stderr: b'Warning: No such file or directory while opening data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==\nError: No or improper image file="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="\nin label of node Key\nWarning: No such file or directory while opening data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==\nError: No or improper image file="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="\nin label of node W1\n']

Inserting the data manually into an SVG file's <image xlink:href=""> attribute works! 🎉 Here is ex08.svg with both xlink:href attributes containing the original images in base64 encoding. Note that the image renders correctly, even though the gist does not include any bitmap files.
This would fix the problem for SVG, and by extension, HTML output. GraphViz output can't be fixed, but I see this as a minor problem.
More about data URIs

@kvid do you think it's a reasonable approach to convert and inject the images into the output SVG file in this way? If so, this PR should only fix the relative path issue, and I'll begin another one to work on the encoding.

Nov 01 '20 15:11 17o2

I'm a bit busy these days, and are not able to contribute as often as I want.

Here are some new discoveries that might fix the above mentioned problem with embedded images.

HTML <IMG> tags allow embedding base64-encoded images within the SRC attribute, avoiding the need to link to external files: example 1; example 2 (encoded by me using base64-image.de)

That looks useful, but I don't think it's much in use. I've never seen it in an HTML page.

GraphViz does not seem to support this. A quick test (using example 1 above) reveals that
attribs['image']['src'] = 'data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=='
produces an error:
graphviz.backend.CalledProcessError: Command '['dot', '-Tpng', '-O', 'ex08']' returned non-zero exit status 1. [stderr: b'Warning: No such file or directory while opening data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==\nError: No or improper image file="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="\nin label of node Key\nWarning: No such file or directory while opening data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==\nError: No or improper image file="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="\nin label of node W1\n']
Inserting the data manually into an SVG file's <image xlink:href=""> attribute works! 🎉 Here is ex08.svg with both xlink:href attributes containing the original images in base64 encoding. Note that the image renders correctly, even though the gist does not include any bitmap files.

This would fix the problem for SVG, and by extension, HTML output. GraphViz output can't be fixed, but I see this as a minor problem.

More about data URIs
@kvid do you think it's a reasonable approach to convert and inject the images into the output SVG file in this way? If so, this PR should only fix the relative path issue, and I'll begin another one to work on the encoding.

Nov 06 '20 01:11 kvid

I'm a bit busy these days, and are not able to contribute as often as I want.

Don't worry about it, I know the situation.

You don't seem to have an e-mail address associated with your GitHub account (besides [email protected], which I assume won't work)... is there a way for me to send you a direct message? Please let me know via [email protected].. thanks!

Nov 09 '20 17:11 17o2

The different approaches each have their advantages and disadvantages:

When specifying an output location different from the input, all image files need to have absolute paths or optionally a modified relative path (if possible) or optionally a copy of the image files can be made at the output location. Converting to absolute paths will not work if the output location is not the final destination. I guess the user should have all 3 options.
Some users might find it useful to allow a data URI as image source input, but it must be converted to a file for Graphviz to find it, and as in the previous point, absolute path or relative path must be selected.
Some users might find it useful to generate a data URI in any image source of the SVG output, but I guess it should be optional per image independently of the options above (and maybe a harness option to be used when not specified for each image). The clean way is to make this a new output (e.g. .data.svg) to avoid renaming or caching the Graphviz generated .svg before overwriting the same output name with the modified version containing data URIs.

Nov 14 '20 22:11 kvid

Ideally, the output files should be self-contained, independent of any paths to other files. Anything else just causes more trouble, and the added effort of specifying behaviors (as per the first bullet point of your post) is not really worth it IMHO.

[Edit] The closest analogy I can think of is including schematic symbols and footprints in an EAGLE project. Once a library component has been included in a project, the project files can be shared without having to also share the libraries, since everything has been embedded within the schematic/PCB file itself. I find this immensely useful and would therefore like to replicate this behavior.

Therefore, my final goal is to:

allow user to specify paths to images inside the YAML, relative (to the YAML file) or absolute
- embedding data URIs in the YAML was the idea behind #188 but I don't see it as a priority
resolve relative image paths to absolute ones when parsing the YAML (this PR)
generate BOM, PNG and SVG output files
modify SVG output, embedding the images in the <image xlink:href=""> tags as described above using base64 encoding.
- I don't really see the need for keeping the SVG with the reference to an external file, except for debugging purposes.
generate HTML output, containing the modified SVG

As you can see, I'm trying to follow the KISS principle instead of providing too much configurability... I'd rather implement this straightforward (implementation-wise), but hopefully foolproof (from the user's perspective) method, and eventuall add more output options later.

Nov 14 '20 22:11 17o2

Embedding images in the output SVG works :tada:

It has been tested with PNG, JPEG and animated GIFs. (To see the working animation, click the Raw button in the Gist, download the file, open in browser.)

Nov 15 '20 11:11 17o2

@kvid, this PR is ready for review. Feel free to mark my comments as resolved if you agree, and let me know if you don't. I am open to discussing your ideas regarding finer user control over how to handle images (embed or not, copy to output directory, ...) as a separate PR, but I would first like to have embedded images as default since it leaves no open questions (see my comment above). Thanks!

Nov 16 '20 19:11 17o2

Thanks for your suggestions. Don't worry about not responding earlier.. there is no obligation here. I've included most of your suggestions and commented on the ones that need discussion/clarification.

Dec 06 '20 12:12 17o2

A major drawback with resolving all image paths to the absolute path, is that the output file contents will depend on the absolute path that normally differ between developers, and thereby lead to false diffs when doing regression testing.

The way I see it, this would only affect .gv files.

Is it worth thinking about a solution in code to output a version of the .gv file with relative paths (for the diff), and one with absolutes?
Is it possible to somehow ignore certain patterns (i.e. only the src attribute of an <img> tag) while doing the diff?
- This could be a task to include in #63 ...

Once #60 is finally implemented, .gv will likely not be used by most people outside of WireViz development.

Jan 01 '21 14:01 17o2

A major drawback with resolving all image paths to the absolute path, is that the output file contents will depend on the absolute path that normally differ between developers, and thereby lead to false diffs when doing regression testing.

The way I see it, this would only affect .gv files.

I agree.

Is it worth thinking about a solution in code to output a version of the .gv file with relative paths (for the diff), and one with absolutes?

This is probably the easiest to implement.

Is it possible to somehow ignore certain patterns (i.e. only the src attribute of an <img> tag) while doing the diff?

This could be a task to include in #63 ...

This would probably need to involve temporary renaming, create modified version, doing diff, and rename back for each .gv file, and something similar for a custom git add function to create the comparing base. It seems more complicated than your first suggestion.

Once #60 is finally implemented, .gv will likely not be used by most people outside of WireViz development.

I don't think it is used much by most people outside of WireViz development today either...

Jan 03 '21 06:01 kvid

What would be the best way of implementing separate .gv files? Some thoughts:

Image paths are resolved in wireviz.py -> parse(), since this is the context where the base path is known.
- I propose reading the image.src attribute here, and placing the absolute path in a new image.src_abs attribute.
The content for the.gv file is generated in Harness.py -> create_graph(), which uses wv_gv_html.py -> html_image()
- How about a flag for create_graph() to choose whether to use image.src or image.src_abs when calling html_image()?
The actual .gv file is output via the Harness.py -> output() function.
- It feels kind of wasteful to call create_graph() twice just to get the two file variants.
- One option is to use the fmt parameter to decide which file to generate. gv_rel and gv_abs could be two of the allowed options and trigger the respective output, a file named *.rel.gv or *.abs.gv. This way, while developing, both files can be generated, while no double effort is wasted during normal use.

What do you think?

Jan 15 '21 16:01 17o2

I'm sorry about this late answer. I agree with the suggested way of implementing separate .gv files. It seems like a reasonable approach.

A similar approach could also be used to enable an optional .html output with an ordinary image tag instead of embedding the .svg file. Then it will be possible to do regression testing on that file too.

Edit: Is it possible to use the same string for fmt as in the output file extension? (Skipping the leading dot in fmt is OK.)

Jan 30 '21 11:01 kvid

A major drawback with resolving all image paths to the absolute path, is that the output file contents will depend on the absolute path that normally differ between developers, and thereby lead to false diffs when doing regression testing.

The way I see it, this would only affect .gv files.

I agree.

I am working on a refactoring of the WireViz CLI in #244 which will finally close #60 and will stop outputting .gv files by default. Since you agree that .gv is rarely needed outside development, would you be OK if I implement your other suggestions, ignore the fact that absolute paths will be visible if .gv output is enabled, and we get this PR ready? I am sure we can then create a follow-up PR to elegantly deal with the case where .gv is generated.

I am using smart_file_resolve() to determine relative image paths. The current strategy does not consider whether the relative path is defined in the main YAML, or the prepended YAML. In both cases, it searches relative to the main YAML's directory first, and tries relative to the prepended YAML only if unsuccessful. I realize it is not the absolute cleanest solution, but it is simple, deterministic, and should work in most cases... and might even have some advantages.

Oct 02 '21 13:10 17o2

Rebased on top of latest to profit from smart_file_resolve(). Old branch with full commit history is backed up as backup/bugfix/image-src_before-latest-rebase

Oct 16 '21 17:10 17o2