plot icon indicating copy to clipboard operation
plot copied to clipboard

Determine length of column-oriented dataframe?

Open Fil opened this issue 3 years ago • 0 comments

Suppose the data is given as an object of columns:

df = {x: [1, 2, 3], y: [1, 2, 3]}

(This is how Quarto returns dataframes, and arquero does something similar.)

To use this in a mark we can call:

Plot.barX(data, {x: df.x, y: df.y, …)

But there is no good way to specify data:

  • if we specify it as {length: n} it will get materialized at some point, which is not optimal if the dataframe has millions of rows.
  • if we pass df.x as data, it is semantically incorrect
  • technically new Array(df.x.length) is fine, but it's a mental stretch

I wonder if we could have either: data = n (a number) —which would be read as new Array(n)—; or a special symbol that would say "use the channels' length". Another useful possibility would be for "dataframe objects" to have some sort of length property.

cc: @allisonhorst ; discussion after reading https://allisonhorst.github.io/posts/2022-10-14-bird-attacks/

Fil avatar Oct 14 '22 20:10 Fil