TangentCFT
TangentCFT copied to clipboard
How to parse latex in dataset
Hi, I found that some formula is written in latex format instead of mathML in dataset. (ex. wpmath0000012/Algebra.html). As a result, it can't be parsed to training data, and be the corpus while retrievaling. However, the retrieval result, res_tangent_cft has record the formula, Algebra:0. Hoe does it occur? I tried to complete the TODO part in math_extractor.py for parsing latex. But, it still has bug. Is there complete version for the part? Thanks.