nlp-paper icon indicating copy to clipboard operation
nlp-paper copied to clipboard

Performance issue in /paper-code/tensorflow_src/tools (by P3)

Open DLPerf opened this issue 3 years ago • 1 comments

Hello! I've found a performance issue in preprocess_tfrecord.py: dataset.batch(batch_size, drop_remainder=drop_remainder)(line 179) should be called before dataset.map(map_func=_parse_dataset_item, num_parallel_calls=mt.cpu_count())(line 173), which could make your program more efficient.

Here is the tensorflow document to support it.

Besides, you need to check the function _parse_dataset_item called in dataset.map(map_func=_parse_dataset_item, num_parallel_calls=mt.cpu_count()) whether to be affected or not to make the changed code work properly. For example, if _parse_dataset_item needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z) after fix.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

DLPerf avatar Aug 29 '21 11:08 DLPerf

Hello, I'm looking forward to your reply~

DLPerf avatar Nov 04 '21 09:11 DLPerf