teachyourselfmath
teachyourselfmath copied to clipboard
A math PDF extraction engine, built for the internet.
teachyourselfmath
I wished for a free website with an ever-growing list of math problems, teachyourselfmath is that website.
Architecture
NOTE: The following architecture is now deprecated. The new engine now requires OS level dependencies to run. This is not documented yet. I'll get to it along with infrastructure migrations in the coming weeks.
If a document containing math problem exists, we'd like to extract every problem from it and dump it in a database. LaTeX is something that can be understood by both, computers and humans. Hence, the problem boils down to converting a PDF into LaTeX, removing the irrelevant parts, and storing the remaining parts.
Meta came up with a model to parse academic PDF documents and find the LaTeX math in it.
Currently, I run this model's server locally on my computer with every PDF I can get my hands on. The main server has a queue-based system that interacts with the model's server and processes all the problems. Here is a visual illustration of how it works:
Setup
- Get
nougatfrom here. Run it as a server. (note: you can skip this step from now, it will be deprecated soon.) - You will need PostgreSQL and Redis to run this.
yarnyarn build- Setup the
.envfile using the.env.examplefile. yarn start!
Contributing
I am happy to accept pull requests. No hard rules.
Acknowledgements
created by Vivek Nathani (@viveknathani_), licensed under the MIT License.