DALI
DALI copied to clipboard
Proposed updates to documentation for `reader_name` argument of nvidia.dali.plugin.*.DALI*Iterator
The reader_name
argument of nvidia.dali.plugin.*.DALI*Iterator()
has been difficult for us to understand. I'd like to propose a rewording of the documentation, but want to check that what I'm proposing is actually correct, and ask some questions.
The current documentation is:
reader_name (str, default = None) - Name of the reader which will be queried to the shard size, number of shards and all other properties necessary to count properly the number of relevant and padded samples that iterator needs to deal with. It automatically sets last_batch_policy to PARTIAL when the FILL is used, and last_batch_padded accordingly to match the reader’s configuration
- I believe the final sentence is incorrect and should be deleted? We have experimented with setting
last_batch_policy
toLastBatchPolicy.FILL
and it seems to work as described in the examples (i.e., it works like "FILL", rather than being overridden to "PARTIAL".) - Rather: I think that the semantics of using this option is that using
reader_name
is mutually exclusive with using thelast_batch_padded
argument, and instead, if thereader_name
argument is set, thelast_batch_padded
argument is set to the value of thepad_last_batch
option of the reader? - The documentation says that the
reader_name
argument is the "name of the reader which will be queried" but doesn't explain what object will be queried to find the reader. Is it the list of pipelines given by thepipelines
argument? What are the semantics if the reader is not found in any of the pipelines from thepipelines
argument? What are the semantics if the reader is found in more than one of the pipelines from thepipelines
argument? - I can't find the documentation for how to set the name of the Reader in a Pipeline. I see that in the MXNet with DALI - ResNet 50 example, the
fn.readers.mxnet
constructor apparently takes an argumentname
, but this argument isn't documented in the documentation forfn.readers.mxnet()
, (nor could I find this argument documented in any of the other readers), and the semantics of naming readers in Pipelines doesn't seem to be documented. The only place I can find the reader name mentioned is in the documentation is in the documentation for thereader_meta
argument to Pipeline. - Finally, I also wonder if the
size
argument to theDALI*Iterator()
is correctly documented. It says that if thesize
is set to the default (-1), "The options last_batch_policy and last_batch_padded don’t work in such case." but then it also says setting size=-1 is mutually exclusive with the reader_name argument.
For reference: the Sharding documentation page has been somewhat helpful in understanding the actual semantics of this argument, but in that case when the reader_name
argument is described, it should also be more explicit about how it is deriving size
from pipeline.reader_meta(reader_name)['epoch_size_padded']
(or however it is actually derived) and last_batch_padded
from pipeline.reader_meta(reader_name)['pad_last_batch']
.
Thanks for reporting the issue. Let me check this and get back to you. We will work something out to make this more understandable.
I can't agree more with you. You really speak to my heart. Several months passed and sadly the documentation is still the same as here at present.
To be honest, there're millions of doubts when I read the documentation. I have no idea about what the size
means. Does it
stand for batch size? And what is the epoch size in the documentation? What should I fill in the reader_name
? What does reader_name = "Readers"
means? Where does the "readers" come from? I feel so helpless...
I also agree. Also, while the "Getting started tutorial" helped a lot, I believe it would be much easier for new users to use DALI if an explanation/example of using "reader_name" was additionally given. (as it is now, it feels like to understand how "reader_name" works, I need to piece together information from bits of documentations, which is making understanding very difficult.) (also adding explanation of reader_name on https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/numpy_reader.html might be helpful to!