donut icon indicating copy to clipboard operation
donut copied to clipboard

Validation step error: re.error: missing ), unterminated subpattern at position

Open kurbobo opened this issue 1 year ago • 4 comments

Hello everyone! I've tried to the model with cyrillic letters I've created synthetic dataset with synthdog and started training. At the beginning of validation occured next error:

You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | VisionEncoderDecoderModel | 201 M

201 M Trainable params 0 Non-trainable params 201 M Total params 807.408 Total estimated model params size (MB) Epoch 0: 20%|████████████████████████████▍ | 8031/40155 [26:52<1:47:29, 4.98it/s, v_num=4Traceback (most recent call last): | 0/4985 [00:00<?, ?it/s] File "train_donut.py", line 322, in trainer.fit(model_module) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit call._call_and_handle_interrupt( File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 935, in _run results = self._run_stage() File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 978, in _run_stage self.fit_loop.run() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run self.advance() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance self.epoch_loop.run(self._data_fetcher) File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 134, in run self.on_advance_end() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 248, in on_advance_end self.val_loop.run() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py", line 177, in _decorator return loop_run(self, *args, **kwargs) File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run self._evaluation_step(batch, batch_idx, dataloader_idx) File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 375, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values()) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 288, in _call_strategy_hook output = fn(*args, **kwargs) File "/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in validation_step return self.model.validation_step(*args, **kwargs) File "train_donut.py", line 247, in validation_step pred = re.sub(r"(?:(?<=>) | (?=", "", answer, count=1) File "/usr/lib/python3.8/re.py", line 208, in sub return _compile(pattern, flags).sub(repl, string, count) File "/usr/lib/python3.8/re.py", line 302, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.8/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.8/sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "/usr/lib/python3.8/sre_parse.py", line 834, in _parse p = _parse_sub(source, state, sub_verbose, nested + 1) File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "/usr/lib/python3.8/sre_parse.py", line 759, in _parse raise source.error("missing ), unterminated subpattern", re.error: missing ), unterminated subpattern at position 12

Does anybody know how to fix it?

kurbobo avatar May 19 '23 09:05 kurbobo

I've tried to train with https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/CORD/Fine_tune_Donut_on_a_custom_dataset_(CORD)_with_PyTorch_Lightning.ipynb

kurbobo avatar May 19 '23 09:05 kurbobo

I have the same problem with custom dataset.

white1107 avatar Jun 01 '23 06:06 white1107

I have same problem on my custom dataset, have any idea ?

t1tc01 avatar Jun 14 '23 08:06 t1tc01

@kurbobo I remove this line pred = re.sub(r"(?:(?<=>) | (?=", "", answer, count=1), I train model for information extraction so i think i dont need this, and it work for me, i hope this will help you.

t1tc01 avatar Jun 14 '23 09:06 t1tc01