Kamron Bhavnagri

Results 29 comments of Kamron Bhavnagri

I managed to fix my problems without any debug logs but debug logs would've been useful It automatically creates .xdll's when i import the .dll's Are they just renamed versions?

> > > Hi @KamWithK : thank you for sharing the usecase. Pretty interesting. > > Looping through reader would make your code run on the main pytorch process. If...

Okay, so I finally managed to get it to work half-decently with Transformers (although not AllenNLP yet I may need to rewrite the Petastorm's data loader for it). In the...

> > > There is a longer term solution that might be better (but requires a significant effort) which is to start using pytorch.util.data.DataLoader for parallelization instead of petastorm's custom...

> > > Also, @KamWithK can you please clarify: are using `make_batch_reader` with `BatchDataLoader`? If so, what would you expect the datatype to be in the return batch for input_ids,...

> @KamWithK, if I'm understanding correctly, you shouldn't need to tokenize "under-the-hood" in a way that requires modifications to `petastorm`. A `pandas.DataFrame` allows tensor-valued fields in the form of `numpy.ndarray`...

> > > There is a longer term solution that might be better (but requires a significant effort) which is to start using pytorch.util.data.DataLoader for parallelization instead of petastorm's custom...

Also, I'm trying to read through `ArrowReaderWorker` right now, and it seems like there is quite a bit of back and forth between Pandas, Numpy and PyArrow data ([example here](https://github.com/uber/petastorm/blob/ffaf6b6ef139a703d7dcc2aad2207ec2324b0741/petastorm/arrow_reader_worker.py#L59-L62)...

Okay thanks @selitvin, #605 looks like a decent solution for now. I've been reading through the docs to try and understand how you handle multiprocessing. Do you just use Parquet...

Okay I've created a PyTorch `IterableDataset` class which can handle multiple workers and just uses PyArrow. Right now I haven't specified any column/row specifications, but from what I'm seeing it...