Performance issues in /tf_crnn (by P3)
Hello! I've found a performance issue in data_handler.py: batch() should be called before map(), which could make your program more efficient. Here is the tensorflow document to support it.
Detailed description is listed below:
dataset.batch(batch_size)(line 431) should be called beforedataset.map(map_fn)(line 313),dataset.map(_load_image, num_parallel_calls=num_parallel_calls)(line 418),dataset.map(_apply_slant, num_parallel_calls=num_parallel_calls)(line 422),dataset.map(_data_augment_fn, num_parallel_calls=num_parallel_calls)(line 424),dataset.map(_normalize_image, num_parallel_calls=num_parallel_calls)(line 425),dataset.map(_pad_image_or_resize, num_parallel_calls=num_parallel_calls)(line 426) anddataset.map(_format_label_codes, num_parallel_calls=num_parallel_calls)(line 427).
Besides, you need to check the function called in map()(e.g., _format_label_codes called in dataset.map(_format_label_codes, num_parallel_calls=num_parallel_calls)) whether to be affected or not to make the changed code work properly. For example, if _format_label_codes needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Hello, I'm looking forward to your reply~