party icon indicating copy to clipboard operation
party copied to clipboard

Runtime error while processing page without textlines (cannot reshape tensor of [...])

Open stweil opened this issue 10 months ago • 10 comments

The image 516100238_0002.jpg can be processed with kraken --input 516100238_0002.jpg 516100238_0002.xml- --alto segment -bl to create an ALTO file 516100238_0002.xml, but party fails with this ALTO file:

party -d cpu --threads 2 ocr -i 516100238_0002.xml 516100238_0002_ocr.xml
Downloading 10.5281/zenodo.14616981 ━━━━━━━━━━━━━   0% 0/0 bytes -:--:-- 0:00:00
Compiling model ✓
Files                                                     0% 0/1 -:--:-- 0:00:42
Processing 516100238_0002.xml ━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:-- 0:00:42
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/stweil/venv3.11_party/bin/party:8 in <module>                          │
│                                                                              │
│   5 from party.cli import cli                                                │
│   6 if __name__ == '__main__':                                               │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])     │
│ ❱ 8 │   sys.exit(cli())                                                      │
│   9                                                                          │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/click/core.py:1161  │
│ in __call__                                                                  │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/click/core.py:1082  │
│ in main                                                                      │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/click/core.py:1697  │
│ in invoke                                                                    │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/click/core.py:1443  │
│ in invoke                                                                    │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/click/core.py:788   │
│ in invoke                                                                    │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/click/decorators.py │
│ :33 in new_func                                                              │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/party/cli/pred.py:1 │
│ 93 in ocr                                                                    │
│                                                                              │
│   190 │   │   │   │   │   │   │   │   │   │    batch_size=batch_size)        │
│   191 │   │   │   │                                                          │
│   192 │   │   │   │   preds = []                                             │
│ ❱ 193 │   │   │   │   for pred in predictor:                                 │
│   194 │   │   │   │   │   logger.info(f'pred: {pred}')                       │
│   195 │   │   │   │   │   preds.append(pred.prediction)                      │
│   196 │   │   │   │   │   progress.update(rec_prog, advance=1)               │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/party/pred.py:152   │
│ in __next__                                                                  │
│                                                                              │
│   149 │   │   │   │   │   │   │    bounds.lines)                             │
│   150 │                                                                      │
│   151 │   def __next__(self):                                                │
│ ❱ 152 │   │   pred_str, line = next(self._pred)                              │
│   153 │   │   if self.prompt_mode == 'curves':                               │
│   154 │   │   │   return BaselineOCRRecord(prediction=pred_str,              │
│   155 │   │   │   │   │   │   │   │   │    cuts=tuple(),                     │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/torch/utils/_contex │
│ tlib.py:36 in generator_context                                              │
│                                                                              │
│    33 │   │   try:                                                           │
│    34 │   │   │   # Issuing `None` to a generator fires it up                │
│    35 │   │   │   with ctx_factory():                                        │
│ ❱  36 │   │   │   │   response = gen.send(None)                              │
│    37 │   │   │                                                              │
│    38 │   │   │   while True:                                                │
│    39 │   │   │   │   try:                                                   │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/party/fusion.py:631 │
│ in predict_string                                                            │
│                                                                              │
│   628 │   │                                                                  │
│   629 │   │   """                                                            │
│   630 │   │   tokenizer = OctetTokenizer()                                   │
│ ❱ 631 │   │   for preds in self.predict_tokens(encoder_input=encoder_input,  │
│   632 │   │   │   │   │   │   │   │   │   │    curves=curves,                │
│   633 │   │   │   │   │   │   │   │   │   │    boxes=boxes,                  │
│   634 │   │   │   │   │   │   │   │   │   │    eos_id=eos_id):               │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/torch/utils/_contex │
│ tlib.py:36 in generator_context                                              │
│                                                                              │
│    33 │   │   try:                                                           │
│    34 │   │   │   # Issuing `None` to a generator fires it up                │
│    35 │   │   │   with ctx_factory():                                        │
│ ❱  36 │   │   │   │   response = gen.send(None)                              │
│    37 │   │   │                                                              │
│    38 │   │   │   while True:                                                │
│    39 │   │   │   │   try:                                                   │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/party/fusion.py:558 │
│ in predict_tokens                                                            │
│                                                                              │
│   555 │   │   │   │   │   │   │   │     dtype=next(self.encoder.parameters() │
│   556 │   │   │                                                              │
│   557 │   │   │   # add line embeddings to encoder hidden states             │
│ ❱ 558 │   │   │   line_embeds = self.line_embedding(batch).unsqueeze(1).expa │
│   559 │   │   │   exp_encoder_hidden_states = encoder_hidden_states[:bsz, .. │
│   560 │   │   │                                                              │
│   561 │   │   │   # prefill step                                             │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/torch/nn/modules/mo │
│ dule.py:1736 in _wrapped_call_impl                                           │
│                                                                              │
│   1733 │   │   if self._compiled_call_impl is not None:                      │
│   1734 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1735 │   │   else:                                                         │
│ ❱ 1736 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1737 │                                                                     │
│   1738 │   # torchrec tests the code consistency with the following code     │
│   1739 │   # fmt: off                                                        │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/torch/nn/modules/mo │
│ dule.py:1747 in _call_impl                                                   │
│                                                                              │
│   1744 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1745 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1746 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1747 │   │   │   return forward_call(*args, **kwargs)                      │
│   1748 │   │                                                                 │
│   1749 │   │   result = None                                                 │
│   1750 │   │   called_always_called_hooks = set()                            │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/party/modules/promp │
│ t.py:79 in forward                                                           │
│                                                                              │
│   76 │   │   embeddings = torch.empty((0, self.embed_dim),                   │
│   77 │   │   │   │   │   │   │   │    device=self.point_embeddings.weight.de │
│   78 │   │   if curves is not None:                                          │
│ ❱ 79 │   │   │   curve_embeddings = self._embed_curves(curves)               │
│   80 │   │   │   embeddings = torch.cat([embeddings, curve_embeddings])      │
│   81 │   │   if boxes is not None:                                           │
│   82 │   │   │   box_embeddings = self._embed_boxes(boxes)                   │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/party/modules/promp │
│ t.py:55 in _embed_curves                                                     │
│                                                                              │
│   52 │   def _embed_curves(self, curves: torch.FloatTensor):                 │
│   53 │   │   point_embedding = self._positional_embed(curves)                │
│   54 │   │   point_embedding += self.point_embeddings.weight[:4]             │
│ ❱ 55 │   │   return point_embedding.view(curves.shape[0], -1)                │
│   56 │                                                                       │
│   57 │   def _embed_boxes(self, boxes: torch.FloatTensor):                   │
│   58 │   │   box_embedding = self._positional_embed(boxes)                   │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/lightning/fabric/ut │
│ ilities/init.py:54 in __torch_function__                                     │
│                                                                              │
│    51 │   ) -> Any:                                                          │
│    52 │   │   kwargs = kwargs or {}                                          │
│    53 │   │   if not self.enabled:                                           │
│ ❱  54 │   │   │   return func(*args, **kwargs)                               │
│    55 │   │   if getattr(func, "__module__", None) == "torch.nn.init":       │
│    56 │   │   │   if "tensor" in kwargs:                                     │
│    57 │   │   │   │   return kwargs["tensor"]                                │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/torch/utils/_device │
│ .py:106 in __torch_function__                                                │
│                                                                              │
│   103 │   │   kwargs = kwargs or {}                                          │
│   104 │   │   if func in _device_constructors() and kwargs.get('device') is  │
│   105 │   │   │   kwargs['device'] = self.device                             │
│ ❱ 106 │   │   return func(*args, **kwargs)                                   │
│   107                                                                        │
│   108 # NB: This is directly called from C++ in torch/csrc/Device.cpp        │
│   109 def device_decorator(device, func):                                    │
│                                                                              │
│ /home/stweil/venv3.11_party/lib/python3.11/site-packages/torch/utils/_device │
│ .py:106 in __torch_function__                                                │
│                                                                              │
│   103 │   │   kwargs = kwargs or {}                                          │
│   104 │   │   if func in _device_constructors() and kwargs.get('device') is  │
│   105 │   │   │   kwargs['device'] = self.device                             │
│ ❱ 106 │   │   return func(*args, **kwargs)                                   │
│   107                                                                        │
│   108 # NB: This is directly called from C++ in torch/csrc/Device.cpp        │
│   109 def device_decorator(device, func):                                    │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the
unspecified dimension size -1 can be any value and is ambiguous

stweil avatar Feb 26 '25 20:02 stweil

The image contains text which is rotated by 90°, and the kraken segmentation fails to detect the rotated lines. Therefore the ALTO file contains a single text block without any text line. Maybe party is not prepared to handle ALTO files without any line.

stweil avatar Feb 26 '25 20:02 stweil

The same error occurs with any empty page.

stweil avatar Feb 27 '25 20:02 stweil

On 25/02/27 12:18PM, Stefan Weil wrote:

stweil left a comment (mittagessen/party#11)

The same error occurs with any empty page.

I probably forgot to set some abort condition for empty pages. I'll do that after the current development round.

mittagessen avatar Feb 28 '25 11:02 mittagessen

I receive the same error, but with a non-empty PageXML file. I use the Party version from commit c7b7e41

Here is the file:

<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15/pagecontent.xsd">
    <Metadata>
        <Creator>dhSegment</Creator>
        <Created>2025-03-21T16:54:28.549676</Created>
        <LastChange>2025-03-21T16:54:28.549676</LastChange>
    </Metadata>
    <Page imageFilename="example_page.jpg" imageWidth="6316" imageHeight="5053">
        <TextRegion id="r0" custom="">
            <Coords points="2977,32 2977,74 4704,74 4704,32"/>
            <TextLine id="l0_0" custom="">
                <Coords points="3240,32 3232,41 3075,41 3067,49 3059,41 3051,49 3042,41 3034,49 3026,41 3018,49 3009,41 3001,49 2977,49 3001,49 3009,57 3018,49 3026,57 3034,49 3042,57 3042,65 3042,57 3051,49 3059,57 3059,65 3067,57 3075,65 3256,65 3264,74 3898,74 3906,65 3947,65 3947,41 3889,41 3881,32 3289,32 3281,41 3273,32 3264,41 3256,32 3248,41"/>
                <TextEquiv>
                    <Unicode/>
                </TextEquiv>
            </TextLine>
            <TextLine id="l0_1" custom="">
                <Coords points="4342,41 4334,49 4268,49 4276,57 4284,49 4292,57 4301,49 4309,57 4317,49 4325,57 4334,57 4342,65 4350,57 4358,65 4366,57 4375,65 4556,65 4564,57 4572,65 4580,57 4588,65 4597,57 4605,57 4613,49 4621,57 4630,49 4638,57 4646,49 4654,57 4662,49 4671,57 4679,49 4687,57 4695,49 4704,49 4597,49 4588,41 4580,49 4572,41 4564,49 4556,41 4424,41 4416,49 4408,41 4399,49 4391,41 4383,49 4375,41 4366,49 4358,41 4350,49"/>
                <TextEquiv>
                    <Unicode/>
                </TextEquiv>
            </TextLine>
            <TextEquiv>
                <Unicode/>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="r1" custom="">
            <Coords points="970,649 970,715 4391,715 4391,649"/>
            <TextLine id="l1_0" custom="">
                <Coords points="1118,649 1110,657 1077,657 1069,666 1036,666 1027,674 1011,674 1003,682 995,682 978,699 970,699 978,699 986,707 995,707 1003,715 1307,715 1315,707 1332,707 1340,715 1348,715 1356,707 1554,707 1562,699 1603,699 1611,707 1620,707 1628,699 1768,699 1784,682 1792,682 1792,674 1784,674 1776,666 1620,666 1611,657 1603,666 1537,666 1529,657 1274,657 1266,649 1134,649 1126,657"/>
                <TextEquiv>
                    <Unicode/>
                </TextEquiv>
            </TextLine>
            <TextLine id="l1_1" custom="">
                <Coords points="3667,657 3659,666 3585,666 3577,674 3569,674 3560,666 3462,666 3454,674 3404,674 3396,666 3330,666 3322,674 3289,674 3281,682 3232,682 3223,674 3190,674 3182,682 3182,690 3190,699 3199,699 3207,707 3264,707 3273,699 3281,707 3437,707 3445,715 3454,715 3462,707 3486,707 3495,715 3544,715 3552,707 3634,707 3643,715 3865,715 3873,707 3996,707 4005,715 4021,715 4029,707 4227,707 4235,699 4243,699 4251,707 4260,699 4276,699 4284,707 4342,707 4350,715 4358,707 4383,707 4391,699 4391,674 4383,666 4120,666 4111,657 4103,666 3996,666 3988,657 3955,657 3947,666 3857,666 3848,657"/>
                <TextEquiv>
                    <Unicode/>
                </TextEquiv>
            </TextLine>
            <TextEquiv>
                <Unicode/>
            </TextEquiv>
        </TextRegion>
        ...
    </Page>
</PcGts>

Thank you in advance!

CrazyCrud avatar Mar 21 '25 17:03 CrazyCrud

On 25/03/21 10:17AM, Constantin Lehenmeier wrote:

CrazyCrud left a comment (mittagessen/party#11)

I receive the same error, but with a non-empty PageXML file. I use the version from commit c7b7e41

The PageXML file doesn't have any baselines defined for the lines so they just get skipped by the parser. Either you add proper baselines if you want recognition with baseline prompts or you add dummy <Baseline> elements with at least two dummy points and use the bounding box prompts instead. Something like:

<Baseline points="1,1 20,20"/>

should do the trick.

mittagessen avatar Mar 21 '25 19:03 mittagessen

Many thanks for the quick feedback! I have subsequently added the baselines automatically and now the recognition works.

CrazyCrud avatar Mar 23 '25 09:03 CrazyCrud

@mittagessen sorry to bring up the issue again. But I'm just unsure about the following question:
As I want to train a custom kraken model for detecting text lines and I can't annotate baselines (as the tools I would want to use and evaluate in my work, OCR4ALL and Aletheia Lite, don't allow this), could I use the prompt type --boxes, so party would then just expect (polygonial) boundings?

With regard to this question, a followup question would be if the default blla kraken model could be used as a base model to finetune on my text line data (without/supressing baelines)?

Thank you very much in advance!

Best regards Constantin

CrazyCrud avatar Aug 18 '25 16:08 CrazyCrud

On 25/08/18 09:40AM, Constantin Lehenmeier wrote:

CrazyCrud left a comment (mittagessen/party#11)

@mittagessen sorry to bring up the issue again. But I'm just unsure about the following question:
As I want to train a custom kraken model for detecting text lines and I can't annotate baselines (as the tools I would want to use and evaluate in my work, OCR4ALL and Aletheia Lite, don't allow this), could I use the prompt type --boxes, so party would then just expect (polygonial) boundings?

Yes, you can train/predict with --promt-mode boxes/--boxes to use only bbox information computed from the bounding polygons. You'll still need to add the dummy baselines to the input, otherwise the kraken parser will still ignore the lines.

With regard to this question, a followup question would be if the default blla kraken model could be used as a base model to finetune on my text line data (without/supressing baelines)?

I'm not sure I understand but by default party is trained by randomly sampling bboxes and baselines but this can be adjusted with --prompt-mode so you can train a bbox-only party model if you like. If you want to annotate baselines automatically you could use blla but for many types of complex writing the results out of the box might be a bit noisy.

On the other hand party seems to be much more robust against faulty training data so if you can auto-annotate somewhere around the batch size number of lines per page image with reasonably high accuracy then it might be viable as well.

mittagessen avatar Aug 18 '25 18:08 mittagessen

Thanks for the quick response!

Yes, you can train/predict with --promt-mode boxes/--boxes to use only bbox information computed from the bounding polygons. You'll still need to add the dummy baselines to the input, otherwise the kraken parser will still ignore the lines.

Got it! I was confused if the boxes promt needs the dummy baselines, but now I know.

I'm not sure I understand but by default party is trained by randomly sampling bboxes and baselines but this can be adjusted with --prompt-mode so you can train a bbox-only party model if you like. If you want to annotate baselines automatically you could use blla but for many types of complex writing the results out of the box might be a bit noisy. On the other hand party seems to be much more robust against faulty training data so if you can auto-annotate somewhere around the batch size number of lines per page image with reasonably high accuracy then it might be viable as well.

I'm sorry, I didn't cleary describe my problem. It's actually more about kraken.

I trained a YOLO model to detect table columns and I want to combine these results with the text line detection within the respective table using kraken (as text lines reach in neighbouring cells etc. I don't want to detect text lines within columns, but within the whole table and then assign them to the columns).

The kraken detection worked quite good on the whole page, but the results within tables are different and I would need to train/finetune on my custom data. Therefore, I would annotate text lines within my table images. The question would be if i could use the kraken blla model as a base model and then train my custom dataset (text lines boxes, but without baselines) or would this be a problem/would it be better to use no base model at all?

Best regards Constantin

CrazyCrud avatar Aug 19 '25 09:08 CrazyCrud

On 25/08/19 02:01AM, Constantin Lehenmeier wrote:

The kraken detection worked quite good on the whole page, but the results within tables are different and I would need to train/finetune on my custom data. Therefore, I would annotate text lines within my table images. The question would be if i could use the kraken blla model as a base model and then train my custom dataset (text lines boxes, but without baselines) or would this be a problem/would it be better to use no base model at all?

Ah yes, fine-tuning the base model would be the way to go here.

mittagessen avatar Aug 19 '25 09:08 mittagessen