herml
herml copied to clipboard
Herml hangs in infinite recursion on double quotes and non-ASCII chars.
I've found Herml (tested on 2942bd1d1b3a9811f3194da237b9b922f6ab7cb8) to hang in parser-generated code on the following template:
!!!
%html
%head
%meta[{charset,"utf-8"}]/
Generated parser continued to recursively call itself herml_scan:string/4 in an endless loop, quickly hogging all memory. Unfortunately, I'm too newbie to Erlang and leex, so I don't really understand why this happens. Anyway, the calls are occurring in this way:
...
string/4(",\"utf-8\"}]/", 1, ",\"utf-8\"}]/", [{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{pipe,1,[]},{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{pipe,1,[]},{pipe,1,[]},{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
...
There are certainly no "|" characters in template, and I really don't know why {pipe,1,[]} is there.
After some mindless fiddling, I've found that similiar hangs happen on intentionally malformed %meta[{charset,"utf-8}]/ (missing second double quote) code, and when there are any "unknown" characters (for example, UTF-8 Cyrillic).
I've attempted to fix the issue with drdaeman/herml@58b4958ed30f42fe3f0761ea065ce1f9b2241528, but due to a lack of expertise I don't know whenever this is the proper solution, or it just happen to work.
There was a specific reason why we chose to use single-quotes, but since I haven't touched the code in over a year, I don't recall why. Needs revisiting.
Actually I haven't touched this code in quite a while. I'm thinking about taking down the repo entirely and let someone else continue development on it.