Greg comments

Results 5 comments of


                                            Greg

How to cluster words into semantic entities, when performing information extraction?

I have implemented this in my project [marie-ai](https://github.com/gregbugaj/marie-ai) , work in progress. Code is modular so you can extract the code and use each piece independently. Here is a reference...

[LMV3 Bug] ValueError: You must provide corresponding bounding boxes with running the examples on run_funsd_cord.py

You can fix this by modifying the `tokenize_and_align_labels` in run_funds.py tested on transformers `4.30.2` We provide each words/boxes/word_labels directly to the tokenizer. ``` def tokenize_and_align_labels(examples, augmentation=False): images = examples["image"] words...

Inference script for DiT text detection

I am using it in my project here [BoxProcessorUlimDit](https://github.com/marieai/marie-ai/blob/4e81e858aa8ead225af541dcc561453e2c701c77/marie/boxes/dit/ulim_dit_box_processor.py#L280) Example: ``` from marie.boxes import BoxProcessorUlimDit from marie.boxes.box_processor import PSMode box = BoxProcessorUlimDit( models_dir="../../model_zoo/unilm/dit/text_detection", cuda=True, ) ( boxes, fragments, lines, _,...

Implement Vision Grid Transformer for Document Layout Analysis

That would be great, I have started looking at [Advanced Literate Machinery](https://github.com/AlibabaResearch/AdvancedLiterateMachinery). I was not able to obtain the weight to test the model, but it does looks very good.

No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

I do have g++ installed in container. I will check out details on the task referenced.