[Bug]: Accessing an indexed (ragged) column by attribute returns VectorData instead of VectorIndex
What happened?
The attribute syntax table.col_name currently returns the VectorData instead of the VectorIndex for a ragged array. It should return the same VectorIndex as in table[col_name] and table.get(col_name). All three methods should return the same result. Otherwise this is confusing. See also https://github.com/NeurodataWithoutBorders/pynwb/issues/1990
Steps to Reproduce
from hdmf.common import DynamicTable
dt = DynamicTable(name="test", description="desc")
dt.add_column(name="col1", description="desc", index=True)
dt.add_row(col1=[0, 1, 2])
print(dt["col1"]) # returns VectorIndex
print(dt.get("col1")) # returns VectorIndex
print(dt.col1) # returns VectorData
print(dt["col1"][0]) # returns [0, 1, 2]
print(dt.get("col1")[0]) # returns [0, 1, 2]
print(dt.col1[0]) # returns 0
Traceback
No response
Operating System
macOS
Python Executable
Conda
Python Version
3.12
Package Versions
No response
@oruebel, @stephprince, and I discussed this today. We agree that the current methods are inconsistent and should be addressed. The plan is to make a breaking change for HDMF 5.0 (not the one this week):
- Change the dot accessor to return the VectorIndex to be consistent with the other two methods of accessing the column.
- Remove the VectorIndex columns from being accessible through these three methods. It's confusing to have
dt.col1anddt.col1_indexreturn the same thing. These methods should only return the high-level columns after ragged/index processing. - Users still need an easy way to get the raw columns. Add a new attribute on the table that is a dictionary that maps the name of the column to the column, whether it is a
VectorDataorVectorIndex.
Related to: https://github.com/hdmf-dev/hdmf-zarr/issues/141