pgfplots
pgfplots copied to clipboard
Implement violin charts
Violin charts are the only type of chart capable of displaying complete distribution data over several cross-sectional data points in a time series, e.g. for displaying the distribution of wages, mean wage, and first/third quartiles per each year.
Useful approach in terms of pgfplotstables
:
- A primary data file references other data files
- The primary data file columns are:
-
data
: the file (relative path./foo.dat
accepted) containing all the plot data -
x
: thex
axis label -
mean
: The mean, calculated if not given or{}
-
q1
: First quartile, as mean. -
median
: Median, as mean. -
q3
: Third quartile, as mean. -
std
: Standard deviation. - The
data
field references a file containing points - if one column
value
, all single points - if two columns
count
andvalue
, each point is duplicatedcount
times
Note that mean
, median
, q1
, and q3
are given because while pgfplots
may be able to compute small data sets, if you're using millions of data points you probably want to compute a few hundred or so representative data points and the mean, median, first and third quartiles, and standard deviation for plotting a decent-resolution chart.
Plots usually represent mean and standard deviation; median and inter-quartile range should also be an option. That part is just a type of box-and-whiskers plotting.
Violin plots in pgfplots would be wonderful. Why? Because they are so more expressive than boxplots:
(taken from https://www.autodesk.com/research/publications/same-stats-different-graphs)
@michaeldorner How did you make this plot? I need to make violin plots on pgfplots.
I attached the source. But it's not PGF since there is no support yet for them.
@michaeldorner Yes, I had seen that you wrote in Python... I am afraid this approach is poorly integrated with LaTeX :disappointed:
@michaeldorner Now there is a solution! Check out my answer!
What an amazing work! Thanks for sharing!!! ❤️