html-parser Feat: Support `<script>`

Hi @hecrj !

I was trying to utilize this package to parse external HTML and possibly generate link preview metadata from OpenGraph/Twitter Card meta tags. Then I had found that parsing can oftentimes fail due to not-yet-supported <script> tags.

I thought about light-weight workarounds, but in the end, found out that contributing to support it properly is actually faster! So here it goes. The implementation is not perfectly based on HTML standard, but I had checked relavant documents frequently while implementing this, so it should do not-so-bad. Added real world test case too.

Ping me anytime on discussion about this patch, on the GitHub, or Slack. Thanks in advance!

Feb 19 '22 12:02 ymtszw

Good stuff, @ymtszw.

I used your work in my parser: https://github.com/danneu/elm-html-parser.

I believe the only change I made was in stringHelp:

stringHelp terminatorChar terminatorStr acc =
    Parser.oneOf
        [ Parser.succeed (\char -> Parser.Loop (acc ++ "\\" ++ char))
            |. Parser.token "\\"
             |= justOneChar
         , Parser.token terminatorStr
             |> Parser.map (\_ -> Parser.Done acc)
-        , Parser.chompWhile (\char -> char /= '\\' && char /= terminatorChar)
+        , chompOneOrMore (\char -> char /= '\\' && char /= terminatorChar)
             |> Parser.getChompedString
             |> Parser.map (\chunk -> Parser.Loop (acc ++ chunk))
         ]

Since chompWhile always succeeds (with 0 chars consumed), I was getting infinite loop on input <script>'. Perhaps you can verify whether this is an issue? I may have changed other things, I don't remember.

Just wanted to thank you for your work.

May 17 '22 04:05 danneu

I see. Will add test cases if I got time

May 17 '22 15:05 ymtszw