PGFPlots.jl icon indicating copy to clipboard operation
PGFPlots.jl copied to clipboard

Missings

Open alfredjmduncan opened this issue 4 years ago • 6 comments

Currently, plot functions accept data with type AbstractMatrix{Real}. This means that the code throws a MethodError when passed data that allows for or includes missing values.

When plotting multiple time series with different frequencies / time spans, there can be quite a bit of messy wrangling required before passing the data through to the plotting functions in PGFPlots.

If it were possible to accept data as AbstractMatrix{Union{Missing,Real}}, then for PGFPlots to drop the Missings for each trace before plotting, that would be much appreciated.

alfredjmduncan avatar Jul 23 '20 14:07 alfredjmduncan

Great idea! We'd welcome a PR.

mykelk avatar Jul 23 '20 16:07 mykelk

There are a few design questions. I guess the two main options are to

  1. To pass the missing s to PGFPlots as nan, which is a standard way to code missing values in PGFPlots. This would mean
  • Updating the plotHelper functions in PGFPlots.jl, then
  • updating the accepted Real / Complex types to Union{Real,Missing} / Union{Complex,Missing} throughout.

(It would also be possible to pass the missings as empty strings, which is more appealing than nans in some ways. But this only works in PGFPlots if values are delimited with commas or semicolons. In some plotHelper functions, values are currently delimited with spaces).

  1. Another option would be to just allow missings when passing a DataFrame to PGFPlots, and to just filter the missings out of the DataFrame columns provided before dispatching into the plotting functions. This would just require updating lines 45-51 of PGFPlots.jl.

(2) is a much smaller change, but drops some useful information from the resulting .tex output files. (1) would allow the user to set whether PGFPlots skips or jumps missing values, which is a useful feature in PGFPlots.

alfredjmduncan avatar Jul 24 '20 09:07 alfredjmduncan

@tawheeler Do you have a preference?

mykelk avatar Jul 25 '20 05:07 mykelk

Julia now has core support for missing values. It seems to make sense to support that directly in PGFPlots.jl as well.

PGFPlots.jl is a weird package in that, rather than typing things, we basically don't add types to anything, and rely on the type itself to dictate how it gets serialized to text when writing to a .tex file. The data itself is an exception to this. As @alfredjmduncan points out, the data fields are of type AbstractMatrix{Real}. I like the idea of moving to Union{Missing, Real}, and then using skipmissing in plotHelper.

tawheeler avatar Jul 25 '20 18:07 tawheeler

Sounds good @tawheeler

mykelk avatar Jul 25 '20 18:07 mykelk

OK great! I'll have a go at the PR.

alfredjmduncan avatar Jul 27 '20 07:07 alfredjmduncan