donut Validation step error: re.error: missing ), unterminated subpattern at position

Hello everyone! I've tried to the model with cyrillic letters I've created synthetic dataset with synthdog and started training. At the beginning of validation occured next error:

You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | VisionEncoderDecoderModel | 201 M

201 M Trainable params 0 Non-trainable params 201 M Total params 807.408 Total estimated model params size (MB) Epoch 0: 20%|████████████████████████████▍ | 8031/40155 [26:52<1:47:29, 4.98it/s, v_num=4Traceback (most recent call last): | 0/4985 [00:00<?, ?it/s] File "train_donut.py", line 322, in trainer.fit(model_module) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit call._call_and_handle_interrupt( File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 935, in _run results = self._run_stage() File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 978, in _run_stage self.fit_loop.run() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run self.advance() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance self.epoch_loop.run(self._data_fetcher) File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 134, in run self.on_advance_end() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 248, in on_advance_end self.val_loop.run() File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py", line 177, in _decorator return loop_run(self, *args, **kwargs) File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run self._evaluation_step(batch, batch_idx, dataloader_idx) File "/.local/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 375, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values()) File "/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 288, in _call_strategy_hook output = fn(*args, **kwargs) File "/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in validation_step return self.model.validation_step(*args, **kwargs) File "train_donut.py", line 247, in validation_step pred = re.sub(r"(?:(?<=>) | (?=", "", answer, count=1) File "/usr/lib/python3.8/re.py", line 208, in sub return _compile(pattern, flags).sub(repl, string, count) File "/usr/lib/python3.8/re.py", line 302, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.8/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.8/sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "/usr/lib/python3.8/sre_parse.py", line 834, in _parse p = _parse_sub(source, state, sub_verbose, nested + 1) File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "/usr/lib/python3.8/sre_parse.py", line 759, in _parse raise source.error("missing ), unterminated subpattern", re.error: missing ), unterminated subpattern at position 12

Does anybody know how to fix it?

May 19 '23 09:05 kurbobo

I've tried to train with https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/CORD/Fine_tune_Donut_on_a_custom_dataset_(CORD)_with_PyTorch_Lightning.ipynb

May 19 '23 09:05 kurbobo

I have the same problem with custom dataset.

Jun 01 '23 06:06 white1107

I have same problem on my custom dataset, have any idea ?

Jun 14 '23 08:06 t1tc01

@kurbobo I remove this line pred = re.sub(r"(?:(?<=>) | (?=", "", answer, count=1), I train model for information extraction so i think i dont need this, and it work for me, i hope this will help you.

Jun 14 '23 09:06 t1tc01

donut donut copied to clipboard

Validation step error: re.error: missing ), unterminated subpattern at position

| Name | Type | Params

0 | model | VisionEncoderDecoderModel | 201 M

donut
donut copied to clipboard