kiss icon indicating copy to clipboard operation
kiss copied to clipboard

Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"


Code for the paper KISS: Keeping it Simple for Scene Text Recognition.

This repository contains the code you can use in order to train a model based on our paper. You will also find instructions on how to access our model and also how to evaluate the model.

Pretrained Model

You can find the pretrained model here. Download the zip and put into any directory. We will refer to this directory as <model_dir>.

Prepare for using the Code

  • make sure you have at least Python 3.7 installed on your system
  • create a new virtual environment (or whatever you like to use)
  • install all requirements with pip install -r requirements.txt (if you do not have a CUDA capable device in your PC, you should remove the package cupy from the file requirements.txt).


If you want to train your model on the same datasets, as we did, you'll need to get the train data first. Second, you can get the train annotation we used from here.

Image Data

You can find the image data for each dataset, using the following links:

  • MJSynth:
  • SynthText:
  • SynthAdd: Follow instructions from here

Once, you've downloaded all the images, you can get the gt-files we've prepared for the MJSynth and SynthAdd datasets here.

For the SynthText dataset, you'll have to create them yourself. You can do so by following these steps:

  1. Get the data and put it into a directory (lets assume we put the data into the directory /data/oxford)
  2. run the script (you can find it in datasets/text_recognition) with the following command line parameters python /data/oxford/gt.mat /data/oxford_words.
  3. This will crop all words based on their axis aligned bounding box from the original oxford gt.
  4. Create train and validation split with the script python /data/oxford_words/gt.json.
  5. Run the script with the following command line: python json_to_npz /data/oxford_words/train.json ../../train_utils/char-map-bos.json. This will create a file called train.npz in the same directory as the file gt.json is currently located in.
  6. Repeat the last step with the files validation.json.

Once you are done with this, you'll need to combine all npz files into one large npz file. You can use the for this. Assume you saved the MJSynth dataset + npz file here /data/mjsynth and the SynthAdd dataset + npz file here /data/SynthAdd, then you'll need to run the script in the following way: python /data/mjsynth/annotation_train.npz /data/oxford_words/train.npz /data/SynthAdd/gt.npz --destination /data/datasets_combined.npz.

Since the datasets may contain words that are longer than N characters (we always set N to 23), we need to get rid of all words that are longer than N characters. You can use the script for this. Use it like so: python 23 /data/datasets_combined.npz --npz. Do the same thing with the file validation.npz you obtained from splitting the SynthText dataset.

If you want to follow our experiments with the balanced dataset, you can create a balanced dataset with the script For example: python /data/datasets_combined_filtered_23.npz datasets_combined_balanced_23.npz -m 200000. If you do not use the -m switch the script will show you dataset statistics and you can choose your own value.

Evaluation Data

In this ssection we explain, hou you can get the evaluation data + annotation. For getting the evaluation data you just need to do 2 steps per dataset:

  1. Clone the repository.
  2. Download the npz annotation file. And place it in the directory, where you cloned the git repository to.
Dataset Git Repo NPZ-Link Note
ICDAR2013 download Rename the directory test to Challenge2_Test_Task3_Images
ICDAR2015 download Rename the dir TestSet to ch4_test_word_images_gt
CUTE80 download -
IIIT5K download -
SVT download Remove all subdirs, but the dir test_crop. Rename this dir to img
SVTP download -


Now you should be ready for training :tada:. You can use the script, which is in the root-directory of this repo.

Before you can start your training, you'll need to adapt the config in config.cfg. Set the values following this list:

  • train_file: Set this to the file /data/datasets_combined_filtered_23.npz
  • val_file: Set this to /data/oxford_words.validation.npz
  • keys in TEST_DATASETS set those to the corresponding npz file you got here and setup in the last step.

You can now run the training with, e.g., python <name for the log> -g 0 -l tests --image-mode RGB --rdr 0.95 This will start the training and create a new directlry with log entries in logs/tests. Get some coffee and sleep, because the training will take some time!

You can inspect the train progress with Tensorboard. Just start Tensorboard in the root directory like so: tensorboard --logir logs.


Once, you've trained a model or if you just downloaded the model we provided, you can run the evaluation script on it.

If you want to know how the model performes on all datasets, you can use the script Lets assume you trained a model and logs/tests/train is the path to the log dir. Now, you can run the evaluation with this command: python config.cfg 0 -b 16 --snapshot-dir logs/tests/train. You can also render the predictions of the model for each evaluation image by making the following changes to the command: python config.cfg 0 -b 1 --snapshot-dir logs/tests/train --render. You will then find the results for each image in the directory logs/tests/train/eval_bboxes.


Feel free to open an issue! You want to contribute? Just open a PR :smile:!


This code is licensed under GPLv3, see the file LICENSE for more information.


If you find this code useful, please cite our paper:

    title={KISS: Keeping It Simple for Scene Text Recognition},
    author={Christian Bartz and Joseph Bethge and Haojin Yang and Christoph Meinel},