lance icon indicating copy to clipboard operation
lance copied to clipboard

Some methods take `**kwargs` but do not check for unused kwrags.

Open westonpace opened this issue 1 year ago • 0 comments

For example, sample is ok (the kwargs are passed through to take):

    def sample(
        self,
        num_rows: int,
        columns: Optional[Union[List[str], Dict[str, str]]] = None,
        randomize_order: bool = True,
        **kwargs,
    ) -> pa.Table:
        total_num_rows = self.count_rows()
        indices = random.sample(range(total_num_rows), num_rows)
        if not randomize_order:
            # Sort the indices in order to increase the locality and thus reduce
            # the number of random reads.
            indices = sorted(indices)
        return self.take(indices, columns, **kwargs)

However, take is not ok (kwargs completely unused):

    def take(
        self,
        indices: Union[List[int], pa.Array],
        columns: Optional[Union[List[str], Dict[str, str]]] = None,
        **kwargs,
    ) -> pa.Table:
        columns_with_transform = None
        if isinstance(columns, dict):
            columns_with_transform = list(columns.items())
            columns = None
        return pa.Table.from_batches(
            [self._ds.take(indices, columns, columns_with_transform)]
        )

In addition, from a brief scan:

  • _take_rows - kwargs unused
  • count_rows - kwargs unused
  • to_batches - kwargs unused

This can cause users to get confused. They might think, for example, that these methods support the same arguments as scanner (e.g. filter) and when they specify the args they don't get any error, but the arg is silently ignored.

westonpace avatar Aug 08 '24 18:08 westonpace