DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Metadata display

Open bkamins opened this issue 2 years ago • 6 comments

While I am implementing https://github.com/JuliaData/DataFrames.jl/pull/3055 let us discuss if we want some DataFrames.jl specific namespace of metadata. Following Apache Arrow, which reserves ARROW: prefix for keys, we could say that e.g. "DFJL:[key name]" is a special pattern for table and column metadata that are handled in a special name.

For example "DFJL:caption" could be table metadata key for table caption and "DFJL:name" could be column metadata key for table column verbose name

Then we could make display of tables to take a kwarg whether such metadata should be printed or not. E.g. in text/plain by default it would not be printed (to save space), but in HTML by default it would be printed if it is available.

Questions:

  1. do we want it?
  2. does the general design idea make sense?
  3. do we need more special keys apart from these two to start with?

CC @ronisbr @nalimilan @pdeffebach

bkamins avatar Jun 12 '22 18:06 bkamins

That's interesting, but I'd rather use generic terms without prefix like "caption" than "DFJL:caption", as ideally one should be able to load e.g. R, Arrow or Stata files and get metadata attached to objects there. For column verbose names, the term used by R (https://github.com/JuliaData/RData.jl/pull/93) and Stata/SAS (https://github.com/junyuan-chen/ReadStatTables.jl/pull/6) is "label" so we should probably stick to that. Not sure there's a convention regarding caption.

nalimilan avatar Jun 13 '22 07:06 nalimilan

We can drop the prefix. I proposed it because this is how Apache Arrow defines namespaces.

bkamins avatar Jun 13 '22 08:06 bkamins

As an additional idea - maybe table level "rownames" metadata could be used to decide what columns should be used to display as row names? This is just a loose thought.

bkamins avatar Jun 19 '22 09:06 bkamins

It's not clear that using metadata programmatically in packages (and in particular DataFrames) is a good idea. Metadata isn't structured at all so e.g. code using it would have to validate that it contains the right type and refers to an existing column every time it's used. We could use internal fields if we want to add this kind of feature.

nalimilan avatar Jun 20 '22 20:06 nalimilan

OK. So let us skip it for now.

bkamins avatar Jun 20 '22 20:06 bkamins

We could use internal fields if we want to add this kind of feature.

I was thinking about it. Also related is https://github.com/JuliaData/DataFrames.jl/issues/3110.

What I think would be useful to store a dictionary allowing PrettyTables.jl to get its configuration from it (overriding standard parameterization). This would in particular resolve https://github.com/JuliaData/DataFrames.jl/issues/3110 and allow for support of https://github.com/ronisbr/PrettyTables.jl/issues/173 in the future.

We could use two approaches:

  • add a special field (which essentially would be metadata)
  • or, as proposed above, use a reserved table level metadata key e.g. "DF_DISPLAY" that would be used for it (and user could additionally set a style for it of :note or :none depending on if user wanted to propagate this display setting).

What benefit do you see of having and additional field over metadata? (in both cases before displaying the stored entry would need to be validated anyway)? The benefit of using metadata for it is that:

  1. we will already have API for metadata, and for custom field we will need to add new functions for it
  2. @ronisbr will be able to have handling of such metadata in PrettyTables.jl so in the future potentially other table types, if they would start supporting metadata, could use the same mechanism for display, so the solution would be more generic.

bkamins avatar Aug 09 '22 09:08 bkamins

I am closing it. In DataFrames.jl we will not define custom metadata behavior. A separate package (tentatively TableMetatadataTools.jl can add pretty-printing features that will use table metadata to customize styling).

If anyone has a different idea for this please comment and I will re-open.

bkamins avatar Sep 13 '22 14:09 bkamins