Performance issues in /tf_crnn (by P3)

Open DLPerf opened this issue 4 years ago • 1 comments

Hello! I've found a performance issue in data_handler.py: batch() should be called before map(), which could make your program more efficient. Here is the tensorflow document to support it.

Detailed description is listed below:

dataset.batch(batch_size)(line 431) should be called before dataset.map(map_fn)(line 313), dataset.map(_load_image, num_parallel_calls=num_parallel_calls)(line 418), dataset.map(_apply_slant, num_parallel_calls=num_parallel_calls)(line 422), dataset.map(_data_augment_fn, num_parallel_calls=num_parallel_calls)(line 424), dataset.map(_normalize_image, num_parallel_calls=num_parallel_calls)(line 425), dataset.map(_pad_image_or_resize, num_parallel_calls=num_parallel_calls)(line 426) and dataset.map(_format_label_codes, num_parallel_calls=num_parallel_calls)(line 427).

Besides, you need to check the function called in map()(e.g., _format_label_codes called in dataset.map(_format_label_codes, num_parallel_calls=num_parallel_calls)) whether to be affected or not to make the changed code work properly. For example, if _format_label_codes needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Aug 29 '21 12:08 DLPerf

Hello, I'm looking forward to your reply~

Nov 04 '21 09:11 DLPerf