Support digitizing plots produced by GNUPlot
Plots like the following file by GNUPLOT should be able to digitize with this tool:
https://commons.wikimedia.org/wiki/File:Ruby_transmittance.svg
Expected: The raw data for the graph
Actual:
$ svgdigitizer digitize Ruby_transmittance.svg
Parent of <path> is not a <g>. Ignoring this path and its siblings.
Ignoring unlabeled <path> and its siblings.
Ignoring unlabeled <path> and its siblings.
Parent of <path> is not a <g>. Ignoring this path and its siblings.
Ignoring unlabeled <path> and its siblings.
Ignoring unlabeled <path> and its siblings.
Parent of <path> is not a <g>. Ignoring this path and its siblings.
Ignoring unlabeled <path> and its siblings.
Ignoring unlabeled <path> and its siblings.
Parent of <path> is not a <g>. Ignoring this path and its siblings.
Ignoring unlabeled <path> and its siblings.
Ignoring unlabeled <path> and its siblings.
Traceback (most recent call last):
File ".venv/bin/svgdigitizer", line 8, in <module>
sys.exit(cli())
~~~^^
File ".venv/lib/python3.13/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
~~~~~~~~~^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.13/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
File ".venv/lib/python3.13/site-packages/click/core.py", line 1830, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
File ".venv/lib/python3.13/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
File ".venv/lib/python3.13/site-packages/svgdigitizer/entrypoint.py", line 232, in digitize
svg_plot.df.to_csv(_outfile(svg, suffix=".csv", outdir=outdir), index=False)
^^^^^^^^^^^
File "/usr/lib/python3.13/functools.py", line 1026, in __get__
val = self.func(instance)
File ".venv/lib/python3.13/site-packages/svgdigitizer/svgplot.py", line 2099, in df
points = LabeledPath.path_points(self.curve)
^^^^^^^^^^
File "/usr/lib/python3.13/functools.py", line 1026, in __get__
val = self.func(instance)
File ".venv/lib/python3.13/site-packages/svgdigitizer/svgplot.py", line 1474, in curve
raise SVGAnnotationError("No paths labeled 'curve:' found.")
svgdigitizer.exceptions.SVGAnnotationError: No paths labeled 'curve:' found.
This is an interesting example, since the data is basically already stored or traced in an SVG. However, our tool does not know which curves in the SVG contain the actual data, nor can it automatically infer the axis coordinates. These still have to be specified in the SVG.
Here is the example you provided with the annotations (mind that the curve and axis labels are groups of elements, as described in our documentation):
you can digitize as follows, which will yield a datapackage, including all metadata.
svgdigitizer figure Ruby_transmittance_annotated.svg --sampling-interval 1 --skewed
Would it be possible to add options to specify the IDs of the elements to use for axis and actual data?
Also trying this again with your SVG now gives No text with scan rate found in the SVG or provided metadata. and
No text with figurecontaining a label such asfigure: 1a found in the SVG.. Also the axis values seem kinda off, with the wavelength in the produced CSV ranging from 202.09 to 0.01. Specifying the expected ranges on the CLI thus would likely be helpful to account for this missed(???) transformation.
Regarding your first point, I don't know what you mean. Can you elaborate on that?
The SVG that I provided indeed had an error. I assigned the wrong value for the second x-axis value. The updated version below includes this.
A label that includes the scan rate is checked, but it has no direct effect on the digitization process when it is missing. If you knew how the data was recorded, you could specify that value in the SVG as text. I added an arbitrary rate in the example rate: 50 nm / s (the unit must correspond to that on the x-axis). If provided, a time axis will be constructed and added to the CSV.
The figure: figure_name label allows additional information on the figure source to be provided. That can be useful when retracing data from published figures, i.e., figure: 4a. The information is then stored in the JSON. I added a text label in the updated SVG.
You can retrace the splines in the SVG more finely by decreasing the sampling rate. Mind, however, that this will never reflect the rate at which the original data was recorded.
The tracing should be fine now. The accuracy finally depends on how exactly you position the x and y-axis handles, which is obviously not perfectly possible.
What I mean with the first point in my previous reply is the followwing:
Let's assume you take a look at the SVG and notice that the element drawing the X axis line is id="foo" and the element drawing the Y axis is id="bar". Furthermore the data is in id="quo" is the data you care about. An option like:
svgdigitize figure data.svg --x-axis=foo,0,1600 --y-axis=bar,0,100 --data=quo
would be nice to have and a bit more intuitive to work with (after all e.g. Inkscape will show you the IDs of elements and the logical extens can usually be read from the graph)
I referenced the above idea in a new issue.
Further I assume that the original issue on the SVG from the Wiki is resolved?
Yes. While the extracted data still seems to have strange axis values, those can be fixed after the fact with a bit of tweaking.
Thanks for the note. I tested it again for the axis values, and they are not too far off. However, there seems to be an issue when the second point for the x-axis coordinate is smaller than the upper limit of the curve, hence stripping of the notes and segments beyond these values.
Aside from the above issue, if your actual values seem off, it would be great to receive more details.
I'll take a shot again at it later. AFAIR I had the issue, that the X values were reported not within the 200…1600nm range, but 0.8..0.2 (or alike). I'll try to reproduce and report the details of what exactly seemed off.