lance
lance copied to clipboard
Some methods take `**kwargs` but do not check for unused kwrags.
For example, sample is ok (the kwargs are passed through to take):
def sample(
self,
num_rows: int,
columns: Optional[Union[List[str], Dict[str, str]]] = None,
randomize_order: bool = True,
**kwargs,
) -> pa.Table:
total_num_rows = self.count_rows()
indices = random.sample(range(total_num_rows), num_rows)
if not randomize_order:
# Sort the indices in order to increase the locality and thus reduce
# the number of random reads.
indices = sorted(indices)
return self.take(indices, columns, **kwargs)
However, take is not ok (kwargs completely unused):
def take(
self,
indices: Union[List[int], pa.Array],
columns: Optional[Union[List[str], Dict[str, str]]] = None,
**kwargs,
) -> pa.Table:
columns_with_transform = None
if isinstance(columns, dict):
columns_with_transform = list(columns.items())
columns = None
return pa.Table.from_batches(
[self._ds.take(indices, columns, columns_with_transform)]
)
In addition, from a brief scan:
-
_take_rows- kwargs unused -
count_rows- kwargs unused -
to_batches- kwargs unused
This can cause users to get confused. They might think, for example, that these methods support the same arguments as scanner (e.g. filter) and when they specify the args they don't get any error, but the arg is silently ignored.