data
data copied to clipboard
document of parameter buffer_size in MaxTokenBucketizer is wrong
According to the document MaxTokenBucketizer buffer_size – This restricts how many tokens are taken from prior DataPipe to bucketize
However, in the code, bucketbatcher.py#L277 The unit of buffer_size is sample not token
Thanks for reporting it. Feel free to open a PR to fix the inline doc.