MathOCR
MathOCR copied to clipboard
手写数学公式识别目前用什么技术实现
大佬好,想咨询下数学公式识别用什么技术可以实现
大佬好,想咨询下数学公式识别用什么技术可以实现
本项目已经明显过时,近年的进展可以参考我的一篇博客https://chungkwong.cc/crohme.html,以及历届CROHME竞赛报告及引用它们的论文。简单来说,现在主流的方法把它建模为图片转文本问题(图片到latex token序列),然后用基于编码器-解码器结构的人工神经网络解决,其中编码器通常用基于CNN(特别是DenseNet的变种)或ViT的backbone network,而解码器一般为自回归的RNN(另加某种注意力机制)或Transformer decoder。要提高准确率,关键在于数据增广。
感谢回复,那篇文章我看了,相关的论文哪里能看到,讯飞是采用哪种技术路线的
感谢回复,那篇文章我看了,相关的论文哪里能看到,讯飞是采用哪种技术路线的
根据另一个公式识别竞赛ICDAR 2023 Competition on Recognition of Multi-line Handwritten Mathematical Expressions的论文中对科大讯飞的冠军系统描述为“iFLYTEK-OCR team uses an encoder-decoder architecture that formulates HMER as an image-to-sequence translation problem. Specifically, the Conv2Former is employed as the image encoder, and a bi-directional trained Transformer decoder with Attention Refinement Module is utilized as the latex sequence decoder. A Beam Search Ensemble is proposed to ensemble the models trained with different sizes of characters. Specifically, at each decoding step, probability distributions produced by all member models are averaged by certain weights, and the top-k candidate characters to be output are decided by the averaged probability distribution. As for the data augmentation, blur, random, color jitter, scale, and TIA Transform are applied to improve the generalization ability of the model.”。科大讯飞在过去几年发表了不少数学公式识别方面的论文,很容易搜索到。数学公式识别的论文一般都会引用CROHME 2016的论文,顺藤摸瓜就能找到这个领域的重要论文。
收到,太感谢啦