learning-notes
learning-notes copied to clipboard
TensorFlow 中为什么要将 TFRecords 分成若干个 shards?
最主要的原因是避免存储、读取大文件,分成若干个 shards 更高效。
When we say shards in data generator (t2t-datagen) it just means that we split large files into a number of smaller files. It's usually better to not have gigabyte-sized files, and reading from multiple files can be faster, that's why we do it. And yes, you can use 1 shard for 100k sentences, though I think having 10 is still fine too. By lukaszkaiser
参考资料: