Question: I need help for training my custom dataset
Hello firstly thank you.
I am using mmocr. I have create "iiit5k" by using mmocr's as following command
python tools/dataset_converters/prepare_dataset.py IIIT5K --task textrecog --overwrite-cfg
I have created cropped image by using textsnake model as following images
My question is "Can I train your model with those type images as a sentences? "
My mmocr iiit5k's textrecog_train.json file as following
{
"metainfo": {
"dataset_type": "TextRecogDataset",
"task_name": "textrecog"
},
"data_list": [
{
"instances": [
{
"text": "Z NO: 2808"
}
],
"img_path": "textrecog_imgs\\train\\00bc6b1efa4849a79bb2d29e1765f93b-ezsam_2.png"
},
{
"instances": [
{
"text": "EKÜ NO: 0001"
}
],
"img_path": "textrecog_imgs\\train\\00bc6b1efa4849a79bb2d29e1765f93b-ezsam_1.png"
}
]
}
Hi @kerberosargos Sorry for the late reply. Of course you can use your data for training. You need to prepare the dataset config like this https://github.com/Mountchicken/Union14M/blob/main/mmocr-dev-1.x/configs/textrecog/base/datasets/union14m_train.py
Thank you for your reply. But I can not understand "how to create json structure". Below json structure is true?
The json structure you provide is correct.
{
"metainfo": {
"dataset_type": "TextRecogDataset",
"task_name": "textrecog"
},
"data_list": [
{
"instances": [
{
"text": "Z NO: 2808"
}
],
"img_path": "textrecog_imgs\\train\\00bc6b1efa4849a79bb2d29e1765f93b-ezsam_2.png"
},
{
"instances": [
{
"text": "EKÜ NO: 0001"
}
],
"img_path": "textrecog_imgs\\train\\00bc6b1efa4849a79bb2d29e1765f93b-ezsam_1.png"
}
]
}
Thank you very much for your support.
Hello again. I have created two json files according to iiit5k dataset json files structure as following
textrecoq_train.json
{
"metainfo": {
"dataset_type": "TextRecogDataset",
"task_name": "textrecog"
},
"data_list": [
{
"instances": [
{
"text": "MF YAB 15005886"
}
],
"img_path": "textrecog_imgs/00bc6b1efa4849a79bb2d29e1765f93b-ezsam/0.png"
},
{
"instances": [
{
"text": "EKÜ NO:0001"
}
],
"img_path": "textrecog_imgs/00bc6b1efa4849a79bb2d29e1765f93b-ezsam/1.png"
},
................
textrecog_test.json
{
"metainfo": {
"dataset_type": "TextRecogDataset",
"task_name": "textrecog"
},
"data_list": [
{
"instances": [
{
"text": "Bu alışverişten kazancınız 3.1 TL."
}
],
"img_path": "textrecog_imgs/00bc6b1efa4849a79bb2d29e1765f93b-ezsam/4.png"
},
{
"instances": [
{
"text": "Güncel Puan:"
}
],
"img_path": "textrecog_imgs/00bc6b1efa4849a79bb2d29e1765f93b-ezsam/9.png"
},
.......................
Train Data Len: 1041 Test Data Len: 260
But I can not understand how to train those file by using mmocr for your model? Can you help me bit more?
Thank you in advance for you support.
Hello again,
I have changed "maerec_s_union14m.py" files code as following
_base_ = [
'_base_marec_vit_s.py',
'../_base_/datasets/union14m_train.py',
'../_base_/datasets/union14m_benchmark.py',
'../_base_/datasets/cute80.py',
'../_base_/datasets/iiit5k.py',
'../_base_/datasets/svt.py',
'../_base_/datasets/svtp.py',
'../_base_/datasets/icdar2013.py',
'../_base_/datasets/icdar2015.py',
'../_base_/default_runtime.py',
'../_base_/schedules/schedule_adamw_cos_10e.py',
]
train_list = [
_base_.iiit5k_textrecog_train
]
val_list = [
_base_.iiit5k_textrecog_test
]
test_list = [
_base_.union14m_benchmark_artistic,
_base_.union14m_benchmark_multi_oriented,
_base_.union14m_benchmark_contextless,
_base_.union14m_benchmark_curve,
_base_.union14m_benchmark_incomplete,
_base_.union14m_benchmark_incomplete_ori,
_base_.union14m_benchmark_multi_words,
_base_.union14m_benchmark_salient,
_base_.union14m_benchmark_general,
]
default_hooks = dict(logger=dict(type='LoggerHook', interval=50))
auto_scale_lr = dict(base_batch_size=128)
train_dataset = dict(
type='ConcatDataset', datasets=train_list, pipeline=_base_.train_pipeline)
val_dataset = dict(
type='ConcatDataset', datasets=val_list, pipeline=_base_.test_pipeline)
test_dataset = dict(
type='ConcatDataset', datasets=test_list, pipeline=_base_.test_pipeline)
train_dataloader = dict(
batch_size=128,
num_workers=12,
persistent_workers=True,
pin_memory=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=train_dataset)
val_dataloader = dict(
batch_size=128,
num_workers=4,
persistent_workers=True,
pin_memory=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=val_dataset)
test_dataloader = dict(
batch_size=128,
num_workers=4,
persistent_workers=True,
pin_memory=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=test_dataset)
val_evaluator = dict(
dataset_prefixes=['IIIT5K'])
test_evaluator = dict(dataset_prefixes=[
'artistic', 'multi-oriented', 'contextless', 'curve', 'incomplete',
'incomplete-ori', 'multi-words', 'salient', 'general'
])
After I have put my jsons and images files to as folowing folder path
Train command like this
python tools/train.py configs/textrecog/maerec/maerec_s_union14m.py
Command result is
iiit5k_textrecog_test = dict(
ann_file='textrecog_test.json',
data_root='data/iiit5k',
pipeline=None,
test_mode=True,
type='OCRDataset')
iiit5k_textrecog_train = dict(
ann_file='textrecog_train.json',
data_root='data/iiit5k',
pipeline=None,
type='OCRDataset')
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=True, type='LogProcessor', window_size=10)
model = dict(
backbone=dict(
depth=12,
embed_dim=384,
img_size=(
32,
128,
),
mlp_ratio=4.0,
num_heads=6,
patch_size=(
4,
4,
),
pretrained=None,
qkv_bias=True,
type='VisionTransformer'),
data_preprocessor=dict(
mean=[
123.675,
116.28,
103.53,
],
std=[
58.395,
57.12,
57.375,
],
type='TextRecogDataPreprocessor'),
decoder=dict(
d_embedding=384,
d_inner=1536,
d_k=48,
d_model=384,
d_v=48,
dictionary=dict(
dict_file=
'C:/Projects/DotNet/Python/ocr/bentasocr/libs/mm/configs/textrecog/maerec/../../../dicts/turkish_digits_symbols_space.txt',
same_start_end=True,
type='Dictionary',
with_end=True,
with_padding=True,
with_start=True,
with_unknown=True),
max_seq_len=48,
module_loss=dict(
ignore_first_char=True, reduction='mean', type='CEModuleLoss'),
n_head=8,
n_layers=6,
postprocessor=dict(type='AttentionPostprocessor'),
type='MAERecDecoder'),
type='MAERec')
optim_wrapper = dict(
optimizer=dict(
betas=(
0.9,
0.999,
),
eps=1e-08,
lr=0.0004,
type='AdamW',
weight_decay=0.01),
type='OptimWrapper')
param_scheduler = [
dict(
T_max=10,
convert_to_iter_based=True,
eta_min=4e-06,
type='CosineAnnealingLR'),
]
randomness = dict(seed=None)
resume = False
svt_textrecog_data_root = 'data/svt'
svt_textrecog_test = dict(
ann_file='textrecog_test.json',
data_root='data/svt',
pipeline=None,
test_mode=True,
type='OCRDataset')
svt_textrecog_train = dict(
ann_file='textrecog_train.json',
data_root='data/svt',
pipeline=None,
type='OCRDataset')
svtp_textrecog_data_root = 'data/svtp'
svtp_textrecog_test = dict(
ann_file='textrecog_test.json',
data_root='data/svtp',
pipeline=None,
test_mode=True,
type='OCRDataset')
svtp_textrecog_train = dict(
ann_file='textrecog_train.json',
data_root='data/svtp',
pipeline=None,
type='OCRDataset')
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
batch_size=128,
dataset=dict(
datasets=[
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/artistic/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/artistic'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/multi_oriented/annotation.json',
data_prefix=dict(
img_path=
'data/Union14M-L/Union14M-Benchmarks/multi_oriented'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/contextless/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/contextless'
),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/curve/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/curve'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/incomplete/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/incomplete'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/incomplete_ori/annotation.json',
data_prefix=dict(
img_path=
'data/Union14M-L/Union14M-Benchmarks/incomplete_ori'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/multi_words/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/multi_words'
),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/salient/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/salient'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/general/annotation.json',
data_prefix=dict(img_path='data/Union14M-L//'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
],
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=(
128,
32,
), type='Resize'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
],
type='ConcatDataset'),
drop_last=False,
num_workers=4,
persistent_workers=True,
pin_memory=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
test_dataset = dict(
datasets=[
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/artistic/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/artistic'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/multi_oriented/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/multi_oriented'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/contextless/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/contextless'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/curve/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/curve'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/incomplete/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/incomplete'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/incomplete_ori/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/incomplete_ori'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/multi_words/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/multi_words'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/salient/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/salient'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/general/annotation.json',
data_prefix=dict(img_path='data/Union14M-L//'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
],
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=(
128,
32,
), type='Resize'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
],
type='ConcatDataset')
test_evaluator = dict(
dataset_prefixes=[
'artistic',
'multi-oriented',
'contextless',
'curve',
'incomplete',
'incomplete-ori',
'multi-words',
'salient',
'general',
],
metrics=[
dict(
mode=[
'exact',
'ignore_case',
'ignore_case_symbol',
],
type='WordMetric'),
dict(type='CharMetric'),
],
type='MultiDatasetsEvaluator')
test_list = [
dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/artistic/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/artistic'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/multi_oriented/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/multi_oriented'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/contextless/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/contextless'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/curve/annotation.json',
data_prefix=dict(img_path='data/Union14M-L/Union14M-Benchmarks/curve'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/incomplete/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/incomplete'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/incomplete_ori/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/incomplete_ori'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/multi_words/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/multi_words'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/salient/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/salient'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/general/annotation.json',
data_prefix=dict(img_path='data/Union14M-L//'),
pipeline=None,
test_mode=True,
type='OCRDataset'),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(scale=(
128,
32,
), type='Resize'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
]
train_cfg = dict(max_epochs=10, type='EpochBasedTrainLoop', val_interval=1)
train_dataloader = dict(
batch_size=128,
dataset=dict(
datasets=[
dict(
ann_file='textrecog_train.json',
data_root='data/iiit5k',
pipeline=None,
type='OCRDataset'),
],
pipeline=[
dict(ignore_empty=True, min_size=0, type='LoadImageFromFile'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(scale=(
128,
32,
), type='Resize'),
dict(
prob=0.5,
transforms=[
dict(
transforms=[
dict(max_angle=15, type='RandomRotate'),
dict(
degrees=15,
op='RandomAffine',
scale=(
0.5,
2.0,
),
shear=(
-45,
45,
),
translate=(
0.3,
0.3,
),
type='TorchVisionWrapper'),
dict(
distortion_scale=0.5,
op='RandomPerspective',
p=1,
type='TorchVisionWrapper'),
],
type='RandomChoice'),
],
type='RandomApply'),
dict(
prob=0.25,
transforms=[
dict(type='PyramidRescale'),
dict(
transforms=[
dict(
p=0.5, type='GaussNoise', var_limit=(
20,
20,
)),
dict(blur_limit=7, p=0.5, type='MotionBlur'),
],
type='mmdet.Albu'),
],
type='RandomApply'),
dict(
prob=0.25,
transforms=[
dict(
brightness=0.5,
contrast=0.5,
hue=0.1,
op='ColorJitter',
saturation=0.5,
type='TorchVisionWrapper'),
],
type='RandomApply'),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
],
type='ConcatDataset'),
num_workers=12,
persistent_workers=True,
pin_memory=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
train_dataset = dict(
datasets=[
dict(
ann_file='textrecog_train.json',
data_root='data/iiit5k',
pipeline=None,
type='OCRDataset'),
],
pipeline=[
dict(ignore_empty=True, min_size=0, type='LoadImageFromFile'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(scale=(
128,
32,
), type='Resize'),
dict(
prob=0.5,
transforms=[
dict(
transforms=[
dict(max_angle=15, type='RandomRotate'),
dict(
degrees=15,
op='RandomAffine',
scale=(
0.5,
2.0,
),
shear=(
-45,
45,
),
translate=(
0.3,
0.3,
),
type='TorchVisionWrapper'),
dict(
distortion_scale=0.5,
op='RandomPerspective',
p=1,
type='TorchVisionWrapper'),
],
type='RandomChoice'),
],
type='RandomApply'),
dict(
prob=0.25,
transforms=[
dict(type='PyramidRescale'),
dict(
transforms=[
dict(p=0.5, type='GaussNoise', var_limit=(
20,
20,
)),
dict(blur_limit=7, p=0.5, type='MotionBlur'),
],
type='mmdet.Albu'),
],
type='RandomApply'),
dict(
prob=0.25,
transforms=[
dict(
brightness=0.5,
contrast=0.5,
hue=0.1,
op='ColorJitter',
saturation=0.5,
type='TorchVisionWrapper'),
],
type='RandomApply'),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
],
type='ConcatDataset')
train_list = [
dict(
ann_file='textrecog_train.json',
data_root='data/iiit5k',
pipeline=None,
type='OCRDataset'),
]
train_pipeline = [
dict(ignore_empty=True, min_size=0, type='LoadImageFromFile'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(scale=(
128,
32,
), type='Resize'),
dict(
prob=0.5,
transforms=[
dict(
transforms=[
dict(max_angle=15, type='RandomRotate'),
dict(
degrees=15,
op='RandomAffine',
scale=(
0.5,
2.0,
),
shear=(
-45,
45,
),
translate=(
0.3,
0.3,
),
type='TorchVisionWrapper'),
dict(
distortion_scale=0.5,
op='RandomPerspective',
p=1,
type='TorchVisionWrapper'),
],
type='RandomChoice'),
],
type='RandomApply'),
dict(
prob=0.25,
transforms=[
dict(type='PyramidRescale'),
dict(
transforms=[
dict(p=0.5, type='GaussNoise', var_limit=(
20,
20,
)),
dict(blur_limit=7, p=0.5, type='MotionBlur'),
],
type='mmdet.Albu'),
],
type='RandomApply'),
dict(
prob=0.25,
transforms=[
dict(
brightness=0.5,
contrast=0.5,
hue=0.1,
op='ColorJitter',
saturation=0.5,
type='TorchVisionWrapper'),
],
type='RandomApply'),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
]
tta_model = dict(type='EncoderDecoderRecognizerTTAModel')
tta_pipeline = [
dict(type='LoadImageFromFile'),
dict(
transforms=[
[
dict(
condition="results['img_shape'][1]<results['img_shape'][0]",
true_transforms=[
dict(
args=[
dict(cls='Rot90', k=0, keep_size=False),
],
type='ImgAugWrapper'),
],
type='ConditionApply'),
dict(
condition="results['img_shape'][1]<results['img_shape'][0]",
true_transforms=[
dict(
args=[
dict(cls='Rot90', k=1, keep_size=False),
],
type='ImgAugWrapper'),
],
type='ConditionApply'),
dict(
condition="results['img_shape'][1]<results['img_shape'][0]",
true_transforms=[
dict(
args=[
dict(cls='Rot90', k=3, keep_size=False),
],
type='ImgAugWrapper'),
],
type='ConditionApply'),
],
[
dict(scale=(
128,
32,
), type='Resize'),
],
[
dict(type='LoadOCRAnnotations', with_text=True),
],
[
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
],
],
type='TestTimeAug'),
]
union14m_benchmark_artistic = dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/artistic/annotation.json',
data_prefix=dict(img_path='data/Union14M-L/Union14M-Benchmarks/artistic'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_contextless = dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/contextless/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/contextless'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_curve = dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/curve/annotation.json',
data_prefix=dict(img_path='data/Union14M-L/Union14M-Benchmarks/curve'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_general = dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/general/annotation.json',
data_prefix=dict(img_path='data/Union14M-L//'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_incomplete = dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/incomplete/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/incomplete'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_incomplete_ori = dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/incomplete_ori/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/incomplete_ori'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_multi_oriented = dict(
ann_file=
'data/Union14M-L/Union14M-Benchmarks/multi_oriented/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/multi_oriented'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_multi_words = dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/multi_words/annotation.json',
data_prefix=dict(
img_path='data/Union14M-L/Union14M-Benchmarks/multi_words'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_benchmark_root = 'data/Union14M-L/Union14M-Benchmarks'
union14m_benchmark_salient = dict(
ann_file='data/Union14M-L/Union14M-Benchmarks/salient/annotation.json',
data_prefix=dict(img_path='data/Union14M-L/Union14M-Benchmarks/salient'),
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_challenging = dict(
ann_file='train_annos/mmocr1.0/train_challenging.json',
data_root='data/Union14M-L/',
pipeline=None,
test_mode=True,
type='OCRDataset')
union14m_data_root = 'data/Union14M-L/'
union14m_easy = dict(
ann_file='train_annos/mmocr1.0/train_easy.json',
data_root='data/Union14M-L/',
pipeline=None,
type='OCRDataset')
union14m_hard = dict(
ann_file='train_annos/mmocr1.0/train_hard.json',
data_root='data/Union14M-L/',
pipeline=None,
type='OCRDataset')
union14m_medium = dict(
ann_file='train_annos/mmocr1.0/train_medium.json',
data_root='data/Union14M-L/',
pipeline=None,
type='OCRDataset')
union14m_normal = dict(
ann_file='train_annos/mmocr1.0/train_normal.json',
data_root='data/Union14M-L/',
pipeline=None,
type='OCRDataset')
union14m_root = 'data/Union14M-L/'
union14m_val = dict(
ann_file='train_annos/mmocr1.0/val_annos.json',
data_root='data/Union14M-L/',
pipeline=None,
type='OCRDataset')
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
batch_size=128,
dataset=dict(
datasets=[
dict(
ann_file='textrecog_test.json',
data_root='data/iiit5k',
pipeline=None,
test_mode=True,
type='OCRDataset'),
],
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=(
128,
32,
), type='Resize'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
],
type='ConcatDataset'),
drop_last=False,
num_workers=4,
persistent_workers=True,
pin_memory=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
val_dataset = dict(
datasets=[
dict(
ann_file='textrecog_test.json',
data_root='data/iiit5k',
pipeline=None,
test_mode=True,
type='OCRDataset'),
],
pipeline=[
dict(type='LoadImageFromFile'),
dict(scale=(
128,
32,
), type='Resize'),
dict(type='LoadOCRAnnotations', with_text=True),
dict(
meta_keys=(
'img_path',
'ori_shape',
'img_shape',
'valid_ratio',
),
type='PackTextRecogInputs'),
],
type='ConcatDataset')
val_evaluator = dict(
dataset_prefixes=[
'IIIT5K',
],
metrics=[
dict(
mode=[
'exact',
'ignore_case',
'ignore_case_symbol',
],
type='WordMetric'),
dict(type='CharMetric'),
],
type='MultiDatasetsEvaluator')
val_list = [
dict(
ann_file='textrecog_test.json',
data_root='data/iiit5k',
pipeline=None,
test_mode=True,
type='OCRDataset'),
]
vis_backends = [
dict(type='LocalVisBackend'),
]
visualizer = dict(
name='visualizer',
type='TextRecogLocalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
])
work_dir = './work_dirs\\bentas_maerec_s_union14m'
04/09 12:18:51 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
04/09 12:18:51 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
--------------------
before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook
--------------------
before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
--------------------
before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
--------------------
after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_train_epoch:
(NORMAL ) IterTimerHook
(NORMAL ) SyncBuffersHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
before_val:
(VERY_HIGH ) RuntimeInfoHook
--------------------
before_val_epoch:
(NORMAL ) IterTimerHook
(NORMAL ) SyncBuffersHook
--------------------
before_val_iter:
(NORMAL ) IterTimerHook
--------------------
after_val_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) LoggerHook
--------------------
after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
--------------------
after_val:
(VERY_HIGH ) RuntimeInfoHook
--------------------
after_train:
(VERY_HIGH ) RuntimeInfoHook
(VERY_LOW ) CheckpointHook
--------------------
before_test:
(VERY_HIGH ) RuntimeInfoHook
--------------------
before_test_epoch:
(NORMAL ) IterTimerHook
--------------------
before_test_iter:
(NORMAL ) IterTimerHook
--------------------
after_test_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) LoggerHook
--------------------
after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
--------------------
after_test:
(VERY_HIGH ) RuntimeInfoHook
--------------------
after_run:
(BELOW_NORMAL) LoggerHook
--------------------
04/09 11:50:16 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
04/09 11:50:16 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
04/09 11:50:16 - mmengine - INFO - Checkpoints will be saved to C:\Projects\DotNet\Python\ocr\bentasocr\libs\mm\work_dirs\bentas_maerec_s_union14m.
04/09 11:54:20 - mmengine - INFO - Exp name: bentas_maerec_s_union14m_20240409_115001
04/09 11:54:20 - mmengine - INFO - Epoch(train) [1][9/9] lr: 3.9233e-04 eta: 0:36:34 time: 27.0981 data_time: 7.0543 loss: 3.8230 loss_ce: 3.8230
04/09 11:54:20 - mmengine - INFO - Saving checkpoint at 1 epochs
04/09 11:56:44 - mmengine - INFO - Epoch(val) [1][3/3] IIIT5K/recog/word_acc: 0.0000 IIIT5K/recog/word_acc_ignore_case: 0.0000 IIIT5K/recog/word_acc_ignore_case_symbol: 0.0000 IIIT5K/recog/char_recall: 0.1536 IIIT5K/recog/char_precision: 0.0373 data_time: 3.4783 time: 46.5298
After train I have tested epoch_1.pth weigth by using mmocr. Result is as following :(
Thank you in advance for your help.
Hi @kerberosargos Sorry for the late reply, the training process looks fine to me, but the prediction result seems bad. I noticed that you only have 1041 images for training. This is far not enough to train a text recognizer. Typically we need more than 10k images to train a text recognizer.
Hi @kerberosargos Sorry for the late reply, the training process looks fine to me, but the prediction result seems bad. I noticed that you only have 1041 images for training. This is far not enough to train a text recognizer. Typically we need more than 10k images to train a text recognizer.
You are welcome. I thank you for your support.
I would like to ask a question more? 1 epoch is normal proccessing or not?
I will try to train 10K image. Thank you again.
You can train longer. E.g. 30 epochs for 10K image.
Hello again. I would like to ask one more question for understanding training process clearly.
I have detected my own training images by using TextSnake pretrained model.
Images results are as following.
json file of those images is as following
{
"data_list": [
{
"instances": [
{
"text": "alBaraka"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/0.png"
},
{
"instances": [
{
"text": "BU BELGEYİ SAKLAYINIZ"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/1.png"
},
{
"instances": [
{
"text": "BU İŞLEM YURT İÇİ KARTLA YAPILMIŞTIR"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/2.png"
},
{
"instances": [
{
"text": "Uer.: 92.09.09"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/3.png"
},
{
"instances": [
{
"text": "VISA"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/4.png"
},
{
"instances": [
{
"text": "AID:A0000000031010"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/5.png"
},
{
"instances": [
{
"text": "REF NO: 4380536323"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/6.png"
},
{
"instances": [
{
"text": "GRUP NO: 034"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/7.png"
},
{
"instances": [
{
"text": "SN:0013 ONAY KODU: 578529"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/8.png"
},
{
"instances": [
{
"text": "KARŞILIĞI MAL/HİZM ALDIM"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/9.png"
},
{
"instances": [
{
"text": "))))"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/10.png"
},
{
"instances": [
{
"text": "VISA CONTACTLESS"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/11.png"
},
{
"instances": [
{
"text": "İŞLEM TEMASSIZ YAPILMIŞTIR"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/12.png"
},
{
"instances": [
{
"text": "200.00TL"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/13.png"
},
{
"instances": [
{
"text": "TUTAR"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/14.png"
},
{
"instances": [
{
"text": "**** **** **** 8900"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/15.png"
},
{
"instances": [
{
"text": "18:17:51 Q ONLINE"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/16.png"
},
{
"instances": [
{
"text": "SATIŞ"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/17.png"
},
{
"instances": [
{
"text": "MÜŞTERİ NÜSHASI"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/18.png"
},
{
"instances": [
{
"text": "00A433B2"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/19.png"
},
{
"instances": [
{
"text": "TERMİNAL NO"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/20.png"
},
{
"instances": [
{
"text": "2602185003"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/21.png"
},
{
"instances": [
{
"text": "İŞYERİ NO"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/22.png"
},
{
"instances": [
{
"text": "*200,00"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/23.png"
},
{
"instances": [
{
"text": "KREDİ KARTI"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/24.png"
},
{
"instances": [
{
"text": "TOPLAM"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/25.png"
},
{
"instances": [
{
"text": "*200,00"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/26.png"
},
{
"instances": [
{
"text": "TOPKDV"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/27.png"
},
{
"instances": [
{
"text": "*1,98"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/28.png"
},
{
"instances": [
{
"text": "KASAP"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/29.png"
},
{
"instances": [
{
"text": "%1"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/30.png"
},
{
"instances": [
{
"text": "*200,00"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/31.png"
},
{
"instances": [
{
"text": "SAAT: 18:17"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/32.png"
},
{
"instances": [
{
"text": "26-03-2024"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/33.png"
},
{
"instances": [
{
"text": "FİŞ NO: 14"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/34.png"
},
{
"instances": [
{
"text": "TEŞEKKÜRLER"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/35.png"
},
{
"instances": [
{
"text": ""
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/36.png"
},
{
"instances": [
{
"text": "TEPECİK VD: 7850693945"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/37.png"
},
{
"instances": [
{
"text": "NO: 24/B BAŞİSKELE/KOCAELİ"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/38.png"
},
{
"instances": [
{
"text": "SEYMEN MAH D-130 KARAYOLU CAD"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/39.png"
},
{
"instances": [
{
"text": "TURİZM NAK LTD ŞTİ"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/40.png"
},
{
"instances": [
{
"text": "SÜLEYMAN KESKİN RESTORAN İNŞ"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/41.png"
},
{
"instances": [
{
"text": "MODA KEBAP LAHMACUN"
}
],
"img_path": "7f2ecb4425a847e8b1c20d14e74b7a3f-ezsam/42.png"
}
]
}
So my question is about "images and json file is correct for training?"
Thank you in advance for your support.