texar-pytorch
texar-pytorch copied to clipboard
Doc polish: "Data Loaders" --> "Datasets"
The section is titled "Data Loaders" https://texar-pytorch.readthedocs.io/en/latest/code/data.html#data-loaders
Would "Datasets" be better? Or does "Data Loaders" fit the Pytorch convention better?
@huzecong @AvinashBukkittu
I personally like Datasets
as section heading here. All the classes described under this are Datasets provided by texar. Our Data Iterators
share similarities with Data Loaders
of pytorch. Also, I see that we are missing the doc for SingleDatasetIterator
. I don't know if this was intentional.
The doc of Args is missing for Batch https://texar-pytorch.readthedocs.io/en/latest/code/data.html#texar.torch.data.Batch
I like Dataset
as well. I think the terms people use to describe data-related modules are pretty messy, so as long as we're being consistent it's fine. Let me reiterate our definitions:
- A data source is something that reads and returns raw data examples one by one. Typical data sources include Python lists and iterators (
SequenceDataSource
andIterDataSource
), lines from text files (TextLineDataSource
), and pickled objects from binary files (PickleDataSource
). - A dataset (or data loader) defines how data examples are preprocessed into a format suitable for the task, and how these processed examples can be batched. These are called
*Data
in our framework for compatibility with the TF version (although I kind of prefer names likeMonoTextData
toMonoTextDataset
because it's shorter and nonetheless to the point). Note that dataset does not perform any of the operations by itself. - A data iterator executes the process and batch operations defined in a dataset. PyTorch calls this a "data loader".
It is intentional that we don't include the doc for SingleDatasetIterator
. Users are expected to only use the DataIterator
interface.
Thanks for the clarifation. Can these definitions be added to somewhere in the doc?
We can probably have an "Overview" page for each set of modules, to give an overview and highlight key features. Like in TF: https://www.tensorflow.org/api_docs/python/tf/data
Sure. I'll get on it.