hdmf [Bug]: Accessing an indexed (ragged) column by attribute returns VectorData instead of VectorIndex

What happened?

The attribute syntax table.col_name currently returns the VectorData instead of the VectorIndex for a ragged array. It should return the same VectorIndex as in table[col_name] and table.get(col_name). All three methods should return the same result. Otherwise this is confusing. See also https://github.com/NeurodataWithoutBorders/pynwb/issues/1990

Steps to Reproduce

from hdmf.common import DynamicTable

dt = DynamicTable(name="test", description="desc")
dt.add_column(name="col1", description="desc", index=True)
dt.add_row(col1=[0, 1, 2])

print(dt["col1"])  # returns VectorIndex
print(dt.get("col1"))  # returns VectorIndex
print(dt.col1)  # returns VectorData

print(dt["col1"][0])  # returns [0, 1, 2]
print(dt.get("col1")[0])  # returns [0, 1, 2]
print(dt.col1[0])  # returns 0

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.12

Package Versions

No response

Nov 14 '24 22:11 rly

@oruebel, @stephprince, and I discussed this today. We agree that the current methods are inconsistent and should be addressed. The plan is to make a breaking change for HDMF 5.0 (not the one this week):

Change the dot accessor to return the VectorIndex to be consistent with the other two methods of accessing the column.
Remove the VectorIndex columns from being accessible through these three methods. It's confusing to have dt.col1 and dt.col1_index return the same thing. These methods should only return the high-level columns after ragged/index processing.
Users still need an easy way to get the raw columns. Add a new attribute on the table that is a dictionary that maps the name of the column to the column, whether it is a VectorData or VectorIndex.

Nov 14 '24 23:11 rly

Related to: https://github.com/hdmf-dev/hdmf-zarr/issues/141

Feb 20 '25 22:02 rly