EduNLP icon indicating copy to clipboard operation
EduNLP copied to clipboard

[Feature] Optimize Tokenzation incluing multi-mode problems, Parser and Formula optimization

Open KenelmQLH opened this issue 3 years ago • 0 comments
trafficstars

Description

(A clear and concise description of what the feature is.)

  • Handle multi-mode problems
    • AST Graph
    • Image
  • Handle noise problems when identify $...$ in Parser (need better rules)
  • Handle Formula ast problems when identify $AB=BC$ and $123$ (consider preprocessing)

References

  • https://huggingface.co/docs/transformers/tasks/image_classification
  • http://home.ustc.edu.cn/~huangzhy/files/papers/ZhenyaHuang-SIGIR2020s.pdf

KenelmQLH avatar Jun 30 '22 03:06 KenelmQLH