AdvancedLiterateMachinery icon indicating copy to clipboard operation
AdvancedLiterateMachinery copied to clipboard

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Results 68 AdvancedLiterateMachinery issues
Sort by recently updated
recently updated
newest added

The implementation of pre-training head, including 3 pre-training task heads, and can load `geolayoutlm_large_pretrain.pt` which contains weights of `ptm_head.predictions` and `ptm_head.triple_geometric_head`. Will this part of code be released?

下载Readme中D4LA对应模型,并执行inference.py ` python inference.py \ --image_root 'xxx' \ --grid_root 'xxx' \ --image_name 'budget_0000022278' \ --dataset D4LA \ --output_root output/ \ --config Configs/cascade/D4LA_VGT_cascade_PTM.yaml \ --opts MODEL.WEIGHTS model/D4LA_VGT_model.pth \ MODEL.WORDGRID.USE_PRETRAIN_WEIGHT False `...

Hi Team , How to use RE pretrained model( funsd dataset) for inferencing on new image?( apply directly on new image) I want to extract the output for RE pretrained...

Can you please provide the code used to process the pre-training dataset IIT CDIP 1.0? I am now trying to do retraining weights for use with a new encoder. Any...

我看geolayoutLM在英文上需要单词级别的bbox作为模型输入。 那么请问开源在modelscope上的中文预训练模型,中文的bbox是如何构建的?分词级别还是字符级别?

When running on a server, multiple cards do not work properly (I rented 3 cards, only one works)

Hi everyone, I am trying to implement GeoLayoutLM but I couldn't find bbox normalization which means real points mapping to normalized coordinates(1000x1000) like layoutlm family. In addition that, Should the...

你好,我是计算机视觉刚入门的萌新,这个代码的batch_size如何调整,谢谢大佬。 Hello, I am new to computer vision. How can I adjust the batch_size of this code? Thank you, sir.

I dont know what the iou threshold in the F1 score of adjacency relationships is. Beacuse the iou threshold in cell detect is 0.5, I want to know if the...

您好,DocMaster体验很棒,请问DocMaster可否通过api在在本地批量地对多张图片进行同一个问题的问答?比如以下场景:我希望知晓每张图片的署名是谁?