dataset_agnostic_segmentation
dataset_agnostic_segmentation copied to clipboard
TensorFlow implementation of a segmentation system for document images.
Dataset Agnostic Word Segmentation
TensorFlow implementation of Towards A Dataset Agnostic Word Segmentation
Usage
Train
python main.py --name train_example --train-hmap --train-regression --iters 100000 --dataset iamdb --data_type train --gpu_id 0
Evaluation
python main.py --name train_example --eval-run --iters 500 --dataset iamdb --data_type val1 --gpu_id 0
Expected output
![]() |
![]() |
---|---|
Original Document (Bentham) | Heatmap (Bentham) |
Dependencies
- Python 2.7
- TensorFlow 1.3
- OpenCV
- CUDA
Also see requirements.txt
License
Code is released under the MIT License (refer to the LICENSE file for details).
Data
You can load several datasets automaticly by passing two commandline flags:
- dataset
- data_type
- data_type_prob
Possible values for the dataset flag are 'iamdb', 'iclef', 'botany', 'icdar' and 'from-pkl-folder' see below for further details. Available data_type values are 'train', 'val1', 'val2' and 'test' you can mix several data_types by passing a string where each type is separated by '#'. Multiple data types will be sampled uniformly unless you pass a string of probabilities to 'dtype_mix_prob' separated by '#' (NOTICE: string must be same length as data_type string)
For example:
python main.py ... --dataset iamdb --data_type #train#val1#val2 --data_type_prob #0.75#0.125#0.125
You can change the above by playing with the code in input_stream.py
Assumed directory structure for the available datasets
Root
└── datasets
├── icdar
├── iclef
├── iamdb
├── botany
└── botany
You probably want to use a symlinks and not to keep the data in the repo directory
cd datasets
ln -s /path/to/icdar icdar
ICDAR2013
-
Available for download from here (You need the images and word ground truth data)
-
Open the rar files to the following directory structure:
datasets/icdar ├── gt_words_test ├── gt_words_train ├── images_test └── images_train
-
Commandline flags: dataset 'icdar' as value to dataset flag, available values for data_type are {train, val1, test}
-
Binary ground truth data converted to bounding box coordinates are available as a pkl from [here]
You can get dataset splits and cached Numpy arrays of bounding box data from here
IAM Database
-
Available for download from here (registration is required)
-
Make sure you have the following directory structure:
datasets/iamdb ├── ascii ├── forms ├── lines ├── sentences ├── words └── xml
-
Commandline flags: dataset 'iamdb' as value to dataset flag, available values for data_type are {train, val1, val2, test}
You can get dataset splits from here
Bentham dataset (iclef)
-
Available for download from here
-
Make sure you have the following directory structure:
datasets/iclef ├── bboxs_train_for_query-by-example.txt ├── pages_devel_jpg ├── pages_devel_xml ├── pages_test_jpg ├── pages_test_xml ├── pages_train_jpg └── pages_train_xml
-
Commandline flags: dataset 'iamdb' as value to dataset flag, available values for data_type are {train, val1, val2, test}
You can get dataset splits from here
Botany
-
Available for download from here
-
Make sure you have the following directory structure:
datasets/botany ├── Botany_Train_III_PageImages ├── Botany_Train_II_PageImages ├── Botany_Train_I_PageImages ├── Botany_Test_PageImages └── xml ├── Botany_Train_III_WL.xml ├── Botany_Train_II_WL.xml ├── Botany_Train_I_WL.xml └── Botany_Test_GT_SegFree_QbS.xml