MathTeXEngine.jl
MathTeXEngine.jl copied to clipboard
Parse environments
Superseed #50 (I did not find how I could easily update it directly) and is a first step into implementing #48.
This is still missing tests.
Currently the parsing results in the following:
envexpressions have the name of the environment as first argument and the rows of the env (separated by\\) as follow up arguments- Each row is a
env_rowexpr, with cells (separated by&) as arguments - Cells represented by
env_cellexpr can contain arbitrary latex constructs, just like agroupexpr can
The parser performs no check on the name of the environment nor on the structure of the content (e.g. making sure rows have a consistent number of columns). I am not sure whether these checks are better done at parser or layouting level...
@TheCedarPrince I'd love your opinion on that. If the result of the parsing seem reasonnable to you, I will add tests and merge it. Layouting the content of env can be done in a subsequent PR.
Example:
julia> texparse(L"\begin{matrix} x & x_2 \\ y + 2 \end{matrix}")
TeXExpr :expr
└─ TeXExpr :env
├─ "matrix"
├─ TeXExpr :env_row
│ ├─ TeXExpr :env_cell
│ │ └─ TeXExpr :char
│ │ └─ 'x'
│ └─ TeXExpr :env_cell
│ └─ TeXExpr :decorated
│ ├─ TeXExpr :char
│ │ └─ 'x'
│ ├─ TeXExpr :digit
│ │ └─ '2'
│ └─ nothing
└─ TeXExpr :env_row
└─ TeXExpr :env_cell
├─ TeXExpr :char
│ └─ 'y'
├─ TeXExpr :spaced
│ └─ TeXExpr :symbol
│ └─ '+'
└─ TeXExpr :digit
└─ '2'
Codecov Report
Merging #56 (6a088c2) into master (2dd00d8) will decrease coverage by
2.13%. The diff coverage is20.51%.
@@ Coverage Diff @@
## master #56 +/- ##
==========================================
- Coverage 75.72% 73.59% -2.14%
==========================================
Files 10 9 -1
Lines 548 568 +20
==========================================
+ Hits 415 418 +3
- Misses 133 150 +17
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/parser/commands_registration.jl | 83.33% <0.00%> (-1.18%) |
:arrow_down: |
| src/parser/texexpr.jl | 46.66% <0.00%> (-33.34%) |
:arrow_down: |
| src/parser/parser.jl | 60.32% <32.00%> (-12.47%) |
:arrow_down: |
| src/engine/texelements.jl | 77.21% <0.00%> (-2.01%) |
:arrow_down: |
| src/engine/fonts.jl | 78.94% <0.00%> (-1.06%) |
:arrow_down: |
| src/prototype.jl |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 2dd00d8...6a088c2. Read the comment docs.
Hey @Kolaru,
Thank you for the update here! I have been meaning to respond but have been immensely busy in the last month or so with travel and grad school applications! That said, I did some tinkering with this PR after I checked it out on my fork!
Here is a very quick proof of concept I was exploring to see about parsing out from the parsed text a valid Julia matrix:
using MathTeXEngine
expr = texparse(L"""\begin{matrix}1 & ϕ \\3 & 4\end{matrix}""")
rows = []
if expr.args[1].head == :env
env_type = expr.args[1].args[1]
body = expr.args[1].args[2:end]
if env_type == "matrix"
for row in body
cells = []
for cell in row.args
for expression in cell.args
if expression.head == :digit
push!(cells, parse(Int64, expression.args[1]))
else
push!(cells, expression.args[1])
end
end
end
push!(rows, cells)
end
end
end
rows = mapreduce(permutedims, vcat, rows)
For this little example, I can recover the original matrix in Julia notation:
> 2×2 Matrix{Any}:
1 'ϕ'
3 4
After having explored and experimented with the PR some, here are my thoughts:
- I really appreciate the generality of the environment parsing mechanism at work here in the parser. It makes adding new environments to the engine, I would imagine, much easier to access as a developer.
- Regarding checks, I would say it may make sense to have two simple checks in the parser.
Ideally the first check could be to look up what sort of environment this LaTeX string represents and associated rules of construction (i.e. a valid Matrix must have the same number of cells per row).
Then, a follow up check could be done while the parsing is occurring against the rules for a given environment so that way time is not wasted in parsing everything then checking if it is valid for a user.
With the example you gave showing it being an invalid matrix, I could imagine a cell_counter for
matrixenvironments running per row where counting cells of the first row increments the cell_counter and then subsequent rows' cells are counted and compared against cell_counter to ensure that each row has the maximum (i.e. same) number of cells per row. To me more simple checks at the parser level make sense. - Excited to start talking about layout!
Eager to see how we could potentially handle a cell that contains an expression like
2x + Z- I imagine we could store that as a JuliaExprlike,:(2x + Z).
Otherwise, AMAZING stuff! Thanks for the work and for tagging me! Life has settled back down so should have more time to comment here and there. :)
I have added several changes to take in account your comments.
- Check that the env has a known name. Currently I just accept
matrix,pmatrixandbmatrix. - Put everything in a matrix after parsing. It seems like LaTeX accept different column length, so I just pad it with empty cells (spaces with 0 width) to match the longest row. Also I put full TeXExpr in the matrix, I don't think there is a need to turn them back to string.
- Add a compact
showmethod forTeXExprso that the matrix of TexExpr is readable. The layout is a bit broken, and I have no idea why.
In example it gives
julia> expr = texparse(L"""\begin{matrix}a + \lim_i^j & \alpha \\ u_3 & \sqrt{2}\end{matrix}""")
TeXExpr :expr
└─ TeXExpr :env
├─ "matrix"
└─ Matrix{TeXExpr}
├─ TeXExpr :group
│ ├─ TeXExpr :char
│ │ └─ 'a'
│ ├─ TeXExpr :spaced
│ │ └─ TeXExpr :symbol
│ │ └─ '+'
│ └─ TeXExpr :underover
│ ├─ TeXExpr :function
│ │ └─ "lim"
│ ├─ TeXExpr :char
│ │ └─ 'i'
│ └─ TeXExpr :char
│ └─ 'j'
├─ TeXExpr :decorated
│ ├─ TeXExpr :char
│ │ └─ 'u'
│ ├─ TeXExpr :digit
│ │ └─ '3'
│ └─ nothing
├─ TeXExpr :symbol
│ └─ 'α'
└─ TeXExpr :sqrt
└─ TeXExpr :digit
└─ '2'
# Just grab the matrix of arguments
julia> expr.args[1].args[2]
2×2 Matrix{TeXExpr}:
TeX"a + \lim_i^j" … TeX"α"
TeX"u_3"
TeX"\sqrt{2}"
I was able to rebase this on the current master with minimal fuss. It looks like it still needs some tests and the layout implementation.