Kevin Tse

Results 85 comments of Kevin Tse

Hi, thanks for opening this issue. I am not sure if I understand your intended use case. > By vectorizing, I mean grouping the X values (the strings) and the...

> I think we have discussed about it earlier about the expected behavior of the combination of `input_col` and `output_col`. Currently, when `input_col` is specified buy `output_col` is not, we...

Sorry I accidentally edited your comment while quoting it.

Are we considering `PinMemory` and `Shuffle` (to turn on and off) as adapters as well? Or those will be strictly DataPipes?

I think this is certainly possible. There are a few paths we can go down: 1. Stick with the existing layout and inject the functional API names into the docstrings...

> I want to add another potential improvement for pyi gen. Currently, the type hint for return value of each functional API is either `IterDataPipe` or `MapDataPipe`. We could change...

@ejguan and I did some digging into this: `dill` 0.3.4 works with `pytest`, but not `python` (it raises `TypeError: cannot pickle '_abc._abc_data' object`). `dill` 0.3.5 doesn't work with `pytest` or...

If there is a Parquet `bytes` object, we can do: ```python reader = pyarrow.BufferReader(obj) parquet_table = pyarrow.parquet.read_table(reader) # Then convert to TorchArrow DataFrame or Pandas ``` Some other options are...

Offline: Discussion: * This buffer-less version is likely better but we need more clear error message. * Let's support both syntax - if "target" is provided, then return only one...