mmocr icon indicating copy to clipboard operation
mmocr copied to clipboard

json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)

Open ChaoRuk opened this issue 2 years ago • 7 comments

Can someone help me please?

ChaoRuk avatar Oct 23 '22 11:10 ChaoRuk

Did your data format match the parser in Config? Please provide a short sample of your data annotation and the full config file.

Also, next time please follow the template to post your issue. It's designed to help everyone understand the situation thoroughly.

gaotongxiao avatar Oct 24 '22 02:10 gaotongxiao

Also these chapters may be helpful: https://mmocr.readthedocs.io/en/latest/tutorials/dataset_types.html https://mmocr.readthedocs.io/en/latest/tutorials/blank_recog.html

gaotongxiao avatar Oct 24 '22 02:10 gaotongxiao

I'm trying to provide my dataset as the example segocr this link (https://github.com/open-mmlab/mmocr/blob/main/configs/base/recog_datasets/seg_toy_data.py). The dataset provided by me is the same as this link (https://github.com/open-mmlab/mmocr/tree/main/tests/data/ocr_char_ann_toy_dataset) I_ images I I_ train I I I_ 1.jpg I I I_ … I I_ val I I_ test I I I_ 1019.jpg I I I_ … I_ label_seg_test.txt I I_ 1019.jpg ผข9104 I I_ … I_ label_seg_train.txt I I_ {'file_name': '1.jpg', 'annotations': [{'char_text': '82-1279', 'char_box': [732.8115942028985, 1733.3333333333333, 993.6811594202898, 1727.5362318840578, 993.6811594202898, 1799.9999999999998, 728.463768115942, 1799.9999999999998]}], 'text': '82-1279'} I I_ …

ChaoRuk avatar Oct 24 '22 04:10 ChaoRuk

@gaotongxiao This is my error RuntimeError: Caught JSONDecodeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/klabs/anaconda3/envs/car3-env/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/klabs/anaconda3/envs/car3-env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/klabs/anaconda3/envs/car3-env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/klabs/anaconda3/envs/car3-env/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 416, in getitem return self.datasets[dataset_idx][sample_idx] File "/home/klabs/workplace_internship/lp_recog/Ruk/mmocr-main/mmocr/datasets/base_dataset.py", line 141, in getitem return self.prepare_test_img(index) File "/home/klabs/workplace_internship/lp_recog/Ruk/mmocr-main/mmocr/datasets/base_dataset.py", line 113, in prepare_test_img return self.prepare_train_img(img_info) File "/home/klabs/workplace_internship/lp_recog/Ruk/mmocr-main/mmocr/datasets/ocr_seg_dataset.py", line 82, in prepare_train_img img_ann_info = self.data_infos[index] File "/home/klabs/workplace_internship/lp_recog/Ruk/mmocr-main/mmocr/datasets/utils/loader.py", line 65, in getitem return self.parser.get_item(self.ori_data_infos, index) File "/home/klabs/workplace_internship/lp_recog/Ruk/mmocr-main/mmocr/datasets/utils/parser.py", line 75, in get_item line_json_obj = json.loads(json_str) File "/home/klabs/anaconda3/envs/car3-env/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/home/klabs/anaconda3/envs/car3-env/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/klabs/anaconda3/envs/car3-env/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

ChaoRuk avatar Oct 24 '22 04:10 ChaoRuk

The tricky part here is that the training set and test set are in different formats. Therefore, you need to make sure the annotation parser matches the dataset format.

For example, the parser in

https://github.com/open-mmlab/mmocr/blob/67ebc6c876bd4bd79d122cd6d525edfc08f6e37d/configs/base/recog_datasets/seg_toy_data.py#L11-L12

corresponds to

https://github.com/open-mmlab/mmocr/blob/main/tests/data/ocr_char_ann_toy_dataset/instances_train.txt

And

https://github.com/open-mmlab/mmocr/blob/67ebc6c876bd4bd79d122cd6d525edfc08f6e37d/configs/base/recog_datasets/seg_toy_data.py#L24-L28

corresponds to

https://github.com/open-mmlab/mmocr/blob/main/tests/data/ocr_char_ann_toy_dataset/instances_test.txt

gaotongxiao avatar Oct 24 '22 06:10 gaotongxiao

@gaotongxiao I sure that the annotation parser matches the dataset format. TT Now i try to use sar model and provide anything like sar example this link (https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py) then when i training this cell `from mmocr.datasets import build_dataset from mmocr.models import build_detector from mmocr.apis import train_detector import os.path as osp

datasets = [build_dataset(cfg.data.train)]

model = build_detector( cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
model.CLASSES = datasets[0].CLASSES

mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) train_detector(model, datasets, cfg, distributed=False, validate=True)`

i found this error in the last line prepare index 2502 with error Extra data: line 1 column 5 (char 4) load index 2502 with error Extra data: line 1 column 5 (char 4)

Can you recommend me? How should this error be solved?

ChaoRuk avatar Oct 25 '22 06:10 ChaoRuk

The dataset in this format (https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py) only works for SegOCR model. It's not applicable for other models such as SAR.

gaotongxiao avatar Oct 25 '22 08:10 gaotongxiao