DataAPI.jl icon indicating copy to clipboard operation
DataAPI.jl copied to clipboard

Suppor dimensional metadata

Open Tokazama opened this issue 1 year ago • 15 comments

The core metadata methods (metadata, metadata!, etc.) provide an intuitive method of communicating that a key-value pair provides some information about the instance of the object it is attached to. The column metadata methods similarly provide an intuitive method of communicating the a key-value pair provides some information about the column of a particular object. However, data with multiple dimensions (array, mesh, coordinates, etc.) often have metadata stored per dimension.

This PR provides support for metadata corresponding to specific dimensions. New method implemented here follow the name pattern of column metadata methods (colmetadata to dimmetadata).

Tokazama avatar Jul 07 '23 03:07 Tokazama

This seems fine to me? @bkamins any concerns?

quinnj avatar Jul 10 '23 08:07 quinnj

I think it is OK to add. I assume @Tokazama has some concrete use case (i.e. I would avoid adding functionalities that will not be used in practice). I have went through the code and left some small comments.

bkamins avatar Jul 10 '23 13:07 bkamins

I think it is OK to add. I assume @Tokazama has some concrete use case (i.e. I would avoid adding functionalities that will not be used in practice). I have went through the code and left some small comments.

Besides the use cases in the original post I have images that have metadata per dimension in the form of name, per dim machine acquisition information, and additional metadata along axes (similar to colmetadata).

Tokazama avatar Jul 11 '23 12:07 Tokazama

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (b131356) 96.36% compared to head (e1b08c5) 96.77%. Report is 1 commits behind head on main.

:exclamation: Current head e1b08c5 differs from pull request most recent head 6a0af43. Consider uploading reports for the commit 6a0af43 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #62      +/-   ##
==========================================
+ Coverage   96.36%   96.77%   +0.41%     
==========================================
  Files           1        1              
  Lines          55       62       +7     
==========================================
+ Hits           53       60       +7     
  Misses          2        2              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 12 '23 12:07 codecov[bot]

@quinnj - all looks good. OK to merge? (I would make a release next)

bkamins avatar Jul 15 '23 07:07 bkamins

Anything needed on my end?

Tokazama avatar Jul 18 '23 14:07 Tokazama

Needs a second approval (which might require some changes, but most likely not) before being merged and released.

bkamins avatar Jul 18 '23 15:07 bkamins

@nalimilan - do you have an opinion on this PR?

bkamins avatar Aug 05 '23 21:08 bkamins

Why not write 1 instead of using a variable? That sounds clearer to me.

The column metadata tests use :col as the argument for which column is being accessed. I thought using the variable dim instead of just 1 would make it easier to quickly read and understand what exactly was being tested.

Tokazama avatar Aug 07 '23 01:08 Tokazama

Question on the intended use: is this PR is about storing information about a dimension itself (e.g. its name or unit), or about its elements (e.g. row names for dimension 1), or both? If the second option, I wonder how this interacts with the existence of column metadata, as columns are usually considered as dimension 2.

Also, how does this relate to AxisArrays, NamedArrays, NamedDims, etc.?

nalimilan avatar Aug 14 '23 14:08 nalimilan

I wonder how this interacts with the existence of column metadata, as columns are usually considered as dimension 2.

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

bkamins avatar Aug 14 '23 14:08 bkamins

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

Tokazama avatar Aug 14 '23 15:08 Tokazama

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

@bkamins Sure. But e.g. for a DataFrame we could imagine implementing dimmetadata at some point given that we define ndims, axes, getindex... So I wanted to make sure we consider whether that would make sense.

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

@Tokazama I guess that would work. Then you mean that the intended use case for this PR is both metadata about a dimension itself and metadata about its elements (rows, columns...), right?

nalimilan avatar Aug 14 '23 17:08 nalimilan

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

@bkamins Sure. But e.g. for a DataFrame we could imagine implementing dimmetadata at some point given that we define ndims, axes, getindex... So I wanted to make sure we consider whether that would make sense.

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

@Tokazama I guess that would work. Then you mean that the intended use case for this PR is both metadata about a dimension itself and metadata about its elements (rows, columns...), right?

Exactly! I figured the alternative was having an additional metadata set of methods (e.g., axesmetadata) which seemed a bit over the top since one can still accomplish similar functionality with the dim methods here. The overlap with column metadata could also be seen as redundant but tables have consistently proven to be very unique since each column can be conceptualized as its own entity.

Tokazama avatar Aug 14 '23 23:08 Tokazama

@nalimilan, is this okay now?

Tokazama avatar Dec 21 '23 17:12 Tokazama