TextRecognitionDataGenerator Generator is very slow

I am trying to generate images using generate from string as I have list of strings to generate from , the issue that it's very slow as generating one image per second

Mar 01 '20 18:03 Mohamed209

That is extremely slow, can you post the command that you used?

Also, what hardware are you using?

Mar 01 '20 21:03 Belval

Here's the script I am using to generate data , I am training on very powerful cloud machine with 6 cpu cores and around 50 gb ram https://github.com/Mohamed209/TextRecognitionDataGenerator/blob/receipts_ocr/generate_training_lines.py

Mar 01 '20 21:03 Mohamed209

I'll try and reproduce the issue on my side, I'll report back soon.

Mar 01 '20 21:03 Belval

Okay so quickly:

Add multiprocessing
Properly time the different parts of the scripts, has not everything is the call to TRDG.

You can run py-spy to see function calls and see what is taking time.

Mar 01 '20 22:03 Belval

as workaround i used parallel processing technique to boost generation , gained more speed but its not the optimal as generating my dataset around half million images would take with this new rate about 10 to 12 hours

`if name == "main":

print("started generating arabic lines :)")
Parallel(n_jobs=-1)(delayed(save_lines)(img, lbl)
                    for img, lbl in tqdm(mixed_generator))
print("started generating english lines :)")
Parallel(n_jobs=-1)(delayed(save_lines)(img, lbl)
                    for img, lbl in tqdm(english_generator))`

Mar 02 '20 14:03 Mohamed209

You initial comment was removed/edited away. But if you really generate images with 40 words/image, having a performance speed of 11-13 imgs/sec is not that bad.

I can try to see if there are low hanging fruits in the code, but since the project does a lot of image manipulations I don't know if I will get a big improvements.

Mar 02 '20 15:03 Belval

@Belval i am generating around 40 characters per image not 40 words , but this number is the worst case i have text samples that is much lower than 40 , i feel that if the string to be generated is small in length then processing will be much more fast , but in my case string on average may contain from 10:20 chars so rendering data is slow , i will investigate this more in next few days here is my new full script https://github.com/Mohamed209/TextRecognitionDataGenerator/blob/receipts_ocr/generate_training_lines.py

Mar 02 '20 15:03 Mohamed209

I see. I never benchmarked each options, so maybe try removing one of these lines and measure the impact of processing time:

distorsion_type=np.random.choice(distorsion_type),
skewing_angle=np.random.choice(skewing_angle),
blur=np.random.choice(blur),

Mar 02 '20 15:03 Belval

Ok I will try

Mar 02 '20 16:03 Mohamed209

Ok I will try

Mar 02 '20 17:03 Mohamed209

One reason why the generate() function is slow is that it reloads the TF graph/session for each text sample! It can be easily rewritten to a class which initializes its own graph/session, loads the model once and then it only uses it for predictions. This can save some 1-2 s for each invocation.

class HandwrittenGenerator:
    
    def __init__(self):
        base_dir = download_model_weights()
        model_dir = os.path.join(base_dir, "handwritten_model")
        path = os.path.join(model_dir, "translation.pkl")
        with open(path, "rb") as file:
            self.translation = pickle.load(file)

        self.graph = tf.Graph()
        self.session = tf.compat.v1.Session(graph=self.graph)
        with self.graph.as_default(), self.session.as_default():
            saver = tf.compat.v1.train.import_meta_graph(os.path.join(model_dir, "model-29.meta"))
            saver.restore(self.session, os.path.join(model_dir, "model-29"))
    
    def generate(self, text, text_color="black"):
        with self.graph.as_default(), self.session.as_default():
            images = []
            colors = [ImageColor.getrgb(c) for c in text_color.split(",")]
            c1, c2 = colors[0], colors[-1]

            color = "#{:02x}{:02x}{:02x}".format(
                rnd.randint(min(c1[0], c2[0]), max(c1[0], c2[0])),
                rnd.randint(min(c1[1], c2[1]), max(c1[1], c2[1])),
                rnd.randint(min(c1[2], c2[2]), max(c1[2], c2[2])),
            )

            for word in text.split(" "):
                _, window_data, kappa_data, stroke_data, coords = _sample_text(
                    self.session, word, self.translation
                )

                strokes = np.array(stroke_data)
                strokes[:, :2] = np.cumsum(strokes[:, :2], axis=0)
                _, maxx = np.min(strokes[:, 0]), np.max(strokes[:, 0])
                miny, maxy = np.min(strokes[:, 1]), np.max(strokes[:, 1])

                fig, ax = plt.subplots(1, 1)
                fig.patch.set_visible(False)
                ax.axis("off")

                for stroke in _split_strokes(_cumsum(np.array(coords))):
                    plt.plot(stroke[:, 0], -stroke[:, 1], color=color)

                fig.patch.set_alpha(0)
                fig.patch.set_facecolor("none")

                canvas = plt.get_current_fig_manager().canvas
                canvas.draw()

                s, (width, height) = canvas.print_to_buffer()
                image = Image.frombytes("RGBA", (width, height), s)
                mask = Image.new("RGB", (width, height), (0, 0, 0))

                images.append(_crop_white_borders(image))

                plt.close()

            return _join_images(images), mask

Then call as:

# initialize once - 1-2 s
generator = HandwrittenGenerator()
for text in texts:
    # < 1 sec or more, depending on text length
    img, mask = generator.generate('your text here', 'black')
    # ...

As for running this in parallel, I'm afraid it could work only when its IO dominated (which likely is). Otherwise multiple TF sessions would compete for resources. Note that in Docker TF detect CPU core count based on the host machine, not the container quota, which may result in too many threads competing for limited resources. This can be detected and set to the session config.

The other reason is that it calls the TF session.run() for each stroke in a loop. I'm not sure if this can be improved to run the whole prediction at once.

Another thing is there's no batching. Eg. for many texts we could perform the steps in parallel. But the code would get more complex.

May 20 '21 20:05 bzamecnik

TextRecognitionDataGenerator TextRecognitionDataGenerator copied to clipboard

Generator is very slow

TextRecognitionDataGenerator
TextRecognitionDataGenerator copied to clipboard