torcharrow
torcharrow copied to clipboard
Interface name (`IDataFrame/IColumn`) vs. factory method (`DataFrame/Column`)
Current Status
In TorchArrow, the interface names are ta.IDataFrame/ta.IColumn while the factory methods are ta.DataFrame/ta.Column:
import torcharrow as ta
a = ta.Column([1, 2, 3])
assert isinstance(a, ta.IColumn)
assert isinstance(a, ta.velox_rt.numerical_column_cpu.NumericColumnCpu)
And we use IColumn/IDataFrame as the type hint in parameter, transformations, etc.
The cavity here is user might think ta.DataFrame / ta.Column as class name on first impression, and later found they have to use ta.IDataFrame / ta.IColumn.
Proposed Change
We want to use DataFrame/Column/NumericalColumn/StringColumn/ListColumn... as the interface name, and ta.dataframe/ta.column as factory method:
import torcharrow as ta
a = ta.column([1, 2, 3])
assert isinstance(a, ta.Column)
This is similar to PyArrow/PyTorch convention (pa.array as factory method, pa.Array as interface name):
import pyarrow as pa
a = pa.array([1, 2, 3])
assert isinstance(a, pa.Array)
assert isinstance(a, pa.IntegerArray)
and PyTorch :torch.tensor as the factory method, torch.Tensor as interface name:
import torch
a = torch.tensor([1, 2, 3])
assert isinstance(a, torch.Tensor)
Resolved by the following PRs:
- https://github.com/pytorch/torcharrow/pull/218
- https://github.com/pytorch/torcharrow/pull/221
- https://github.com/pytorch/torcharrow/pull/224
- https://github.com/pytorch/torcharrow/pull/226