herml icon indicating copy to clipboard operation
herml copied to clipboard

Herml hangs in infinite recursion on double quotes and non-ASCII chars.

Open drdaeman opened this issue 14 years ago • 2 comments

I've found Herml (tested on 2942bd1d1b3a9811f3194da237b9b922f6ab7cb8) to hang in parser-generated code on the following template:

!!!
%html
  %head
    %meta[{charset,"utf-8"}]/

Generated parser continued to recursively call itself herml_scan:string/4 in an endless loop, quickly hogging all memory. Unfortunately, I'm too newbie to Erlang and leex, so I don't really understand why this happens. Anyway, the calls are occurring in this way:

...
string/4(",\"utf-8\"}]/", 1, ",\"utf-8\"}]/", [{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{pipe,1,[]},{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
string/4("\"utf-8\"}]/", 1, "\"utf-8\"}]/", [{pipe,1,[]},{pipe,1,[]},{pipe,1,[]},{pipe,1,[]},{comma,1,[44]},{chr,1,"charset"},{lcurly,1,[123]},{lbrace,1,[91]},{chr,1,[109,101,116,97]},{tag_start,1,[37]}])
...

There are certainly no "|" characters in template, and I really don't know why {pipe,1,[]} is there.

After some mindless fiddling, I've found that similiar hangs happen on intentionally malformed %meta[{charset,"utf-8}]/ (missing second double quote) code, and when there are any "unknown" characters (for example, UTF-8 Cyrillic).

I've attempted to fix the issue with drdaeman/herml@58b4958ed30f42fe3f0761ea065ce1f9b2241528, but due to a lack of expertise I don't know whenever this is the proper solution, or it just happen to work.

drdaeman avatar Jan 07 '11 13:01 drdaeman

There was a specific reason why we chose to use single-quotes, but since I haven't touched the code in over a year, I don't recall why. Needs revisiting.

seancribbs avatar Jan 07 '11 18:01 seancribbs

Actually I haven't touched this code in quite a while. I'm thinking about taking down the repo entirely and let someone else continue development on it.

kevsmith avatar Jan 10 '11 02:01 kevsmith