htmlentities
htmlentities copied to clipboard
HTMLEntities is a simple library to facilitate encoding and decoding of named (ý and so on) or numerical ({ or Ī) entities in HTML and XHTML documents.
https://html.spec.whatwg.org/multipage/named-characters.html  https://github.com/threedaymonk/htmlentities/blob/master/lib/htmlentities/mappings/expanded.rb doesn't quite have this. The good news is; https://html.spec.whatwg.org/entities.json exists so potentially much of the 'expanded' set could be replaced by a snapshot of this.
Getting this issue from within the HTMLEntities#decode method from a decrypted csv file. Is there a workaround for this?
Add an option (defaults to off) which allows decoding to be done in a way that's flexible to capitalization mistakes. If the correct capitalization exists, it'll be used. Otherwise if...
Add an option that allows users to decode (invalid) HTML entities that forget the `#` sign, such as `&1234;` instead of `Ӓ`. (I'll open a PR for this soon, I...
Add an option that allows users to decode (invalid) HTML entities with incorrect casing, such as `&Amp;` instead of `&`. (I'll open a PR for this soon, I just need...
[Line 16 in test/expanded_test.rb](https://github.com/threedaymonk/htmlentities/blob/master/test/expanded_test.rb#L16): ` ['subE', 0x2286, nil, "skip", "⊆", ],` The decoded char is supposed to be: `⫅` [Line 655 in lib/htmlentities/mappings/expanded.rb](https://github.com/threedaymonk/htmlentities/blob/master/lib/htmlentities/mappings/expanded.rb#L655): ` 'subE' => 0x2286, # ⊆` should...
It'd be great if one could specify that some characters should be excluded from decoding. For example when trying to sanitize/normalize HTML, < and > are good examples to be...
I quite often run into deprecated html entities that don't contain trailing semicolon. I'm attaching a pr with respect to [HTML 4 specification regarding character references](https://www.w3.org/TR/html4/charset.html#h-5.3): > Note. In SGML,...
Squiggly brackets need love too! ``` {} ``` ... also &lparen; and &lpar; which decode to ( ... and &rparen; and &rpar; which decode to ) ... and &lsqb; and...