TangentCFT icon indicating copy to clipboard operation
TangentCFT copied to clipboard

How to parse latex in dataset

Open WangPeiSyuan opened this issue 1 year ago • 0 comments

Hi, I found that some formula is written in latex format instead of mathML in dataset. (ex. wpmath0000012/Algebra.html). As a result, it can't be parsed to training data, and be the corpus while retrievaling. However, the retrieval result, res_tangent_cft has record the formula, Algebra:0. Hoe does it occur? I tried to complete the TODO part in math_extractor.py for parsing latex. But, it still has bug. Is there complete version for the part? Thanks.

WangPeiSyuan avatar Mar 12 '23 07:03 WangPeiSyuan