CV: Add possibility to sample specifically in for example 5 mV steps
Does it make sense to have the possibility to set a sampling parameter in the svg itself?
I am not sure. It does not seem to be plot-specific. Also, it is a parameter that you want to play around with, which I think is more convenient when it is not embedded in the svg file.
Supplying the sampling parameter would be optional and is only used when nothing else is specified. We only want to sample as much as is need that the data resembles reasonably well the curve in the paper. The real sampling is not known. And an experienced curator would directly estimate this from the curve (sharp features etc.). The curator would set it either in the svg, the yaml or not at all (default is used).
reopened due to discussion.
Based on our discussion last week, I thought it does not really matter how fine we sample. Hence if a low sampling rate would be enough for files without sharp features, it still does not matter if we use more samples. Overall the file sizes will remain small.
Note that sharp features are typically kept independent of the sampling rate. It is possible to produce sharp features between end points of a cubic bezier segment:

But this is not what people do when they trace a curve in inkscape. They will certainly put end points (i.e. click) on the features and then move the other control points such that the trace is close to the curve in its smooth parts. Since the end points are always included in the sample, the sampling rate makes no difference for features in practice.
After discussing with @linuxrider today, the apex of a sharp feature could indeed be lost, when the points are collected equidistantly.
Sorry, but what's an apex?
The pointy end of a sharp feature. :) If we sample around the maximum it will look more flat than it actually is.
The pointy end of a sharp feature. :) If we sample around the maximum it will look more flat than it actually is.
I think in practice this cannot happen. (See my comment above.) It seems I did not make myself understood there.
After all those years, I have the feeling that this discussion is really based on a misunderstanding.
To recap.
- The maxima of curves could be lost if they are not explicitly selected as endpoints in the SVG and the sampling rate is low.
- We assume that the users always select maxima, to mitigate this issue.
Maybe this point could be laid out more clearly in the documentation.
I am not sure in how far this was related to the comment by @linuxrider above https://github.com/echemdb/svgdigitizer/issues/76#issuecomment-932719535 I can only assume that it meant that the user had a choice of how sampling should be done. At least currently, you can sample or leave it be, so everything should be covered. In both cases, endpoints are recorded independent of the sampling rate (see above).
I close it for no, again... :)