NielsRogge

Results 388 comments of NielsRogge

Hi, All examples regarding TAPEX can be found here: https://github.com/huggingface/transformers/tree/main/examples/research_projects/tapex

You can get logits by specifying `output_scores=True` to the generate method.

They include the `scores`, which are the raw logits in case you use greedy decoding.

Refer to https://github.com/huggingface/transformers/issues/15451#issue-1120232737

To get the predicted start and end position, you need to do an argmax on the last dimension, not the (default) first one: ``` predicted_start_position = torch.argmax(outputs.start_logits, -1) predicted_end_position =...

Hi, I do have a notebook on fine-tuning LayoutLMv2 on CORD: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/CORD/Fine_tuning_LayoutLMv2ForTokenClassification_on_CORD.ipynb. As LayoutLMv3 is very identical to LayoutLMv2, you just need to update the code.

cc'ing @mariosasko. Also having this issue. Got a similar issue with ["nielsr/funsd-image-feature"](https://huggingface.co/datasets/nielsr/funsd-image-feature), even though this worked fine in the past.

Yes I'm currently also not able to reproduce it. It's a weird issue, seems flaky. If I encounter it again, will report here.

Hi, That's a great question! I think that recognizing these segments can be seen as "layout analysis", see https://paperswithcode.com/task/document-layout-analysis. This is often framed as an object detection problem. Now, the...

Yes that's correct. Note that in the [StructuralLM paper](https://arxiv.org/abs/2105.11210), which introduced segment position embeddings, they just consider the bounding boxes that an OCR engine outputs as "cells" (= "segments").