torcharrow icon indicating copy to clipboard operation
torcharrow copied to clipboard

Interface name (`IDataFrame/IColumn`) vs. factory method (`DataFrame/Column`)

Open wenleix opened this issue 3 years ago • 0 comments
trafficstars

Current Status

In TorchArrow, the interface names are ta.IDataFrame/ta.IColumn while the factory methods are ta.DataFrame/ta.Column:

import torcharrow as ta
a = ta.Column([1, 2, 3])
assert isinstance(a, ta.IColumn)
assert isinstance(a, ta.velox_rt.numerical_column_cpu.NumericColumnCpu)

And we use IColumn/IDataFrame as the type hint in parameter, transformations, etc.

The cavity here is user might think ta.DataFrame / ta.Column as class name on first impression, and later found they have to use ta.IDataFrame / ta.IColumn.

Proposed Change

We want to use DataFrame/Column/NumericalColumn/StringColumn/ListColumn... as the interface name, and ta.dataframe/ta.column as factory method:

import torcharrow as ta
a = ta.column([1, 2, 3])
assert isinstance(a, ta.Column)

This is similar to PyArrow/PyTorch convention (pa.array as factory method, pa.Array as interface name):

import pyarrow as pa
a = pa.array([1, 2, 3])
assert isinstance(a, pa.Array)
assert isinstance(a, pa.IntegerArray)

and PyTorch :torch.tensor as the factory method, torch.Tensor as interface name:

import torch
a = torch.tensor([1, 2, 3])
assert isinstance(a, torch.Tensor)

wenleix avatar Feb 08 '22 04:02 wenleix

Resolved by the following PRs:

  • https://github.com/pytorch/torcharrow/pull/218
  • https://github.com/pytorch/torcharrow/pull/221
  • https://github.com/pytorch/torcharrow/pull/224
  • https://github.com/pytorch/torcharrow/pull/226

wenleix avatar Aug 27 '22 06:08 wenleix