WikiHow-Dataset
WikiHow-Dataset copied to clipboard
Clean up math syntax
First and foremost, thank you for releasing such a high impact dataset @mahnazkoupaee !
I was interested in reusing it for some of my work focusing on mathematical syntax - which WikiHow has examples of, using TeX syntax. However, currently the serialization is a bit rocky/dirty as each expression appears serialized twice - one as flat plaintext, and one as the source TeX syntax. Here are 5 examples when searching for a standard square root macro, \sqrt
.
I think the pattern is clear to the eye, and - by luck or design? - I can even imagine a regular expression that can recover the TeX while dropping the plain-text, as the pure text doesn't seem to have any whitespace characters. So thankfully I can still make use of the current release with some cleanup. But I wanted to raise your attention to the issue, especially if you are planning on updating/regenerating the dataset in the foreseeable future - it would be quite nice to get only the TeX in some clean+standard fashion, e.g. as expected by MathJax. Thanks!
This works for any a,b,c{\displaystyle a,b,c} and outputs an x{\displaystyle x} that can be real or complex. To confirm that this process works, simply follow the steps of this article in reverse order to recover standard form.
x=−b±b2−4ac2a{\displaystyle x={\frac {-b\pm {\sqrt {b^{2}-4ac}}}{2a}}}
Set up the formula xn{\displaystyle x_{n}}=ϕn−(1−ϕ)n5{\displaystyle {\frac {\phi ^{n}-(1-\phi )^{n}}{\sqrt {5}}}}.,
If your numbers are already in polar form, skip this step. Otherwise, use the relations below. r=a2+b2{\displaystyle r={\sqrt {a^{2}+b^{2}}}}
This is the complex number in polar form. We have its magnitude r=x2+y2{\displaystyle r={\sqrt {x^{2}+y^{2}}}} on the outside.
Note — raising a value a to the 1b{\displaystyle {\frac {1}{b}}} exponent is equivalent to taking the bth root of a. You will likely need a calculator with an ""nx{\displaystyle n{\sqrt {x}}}"" button, or a good online calculator.