changelog icon indicating copy to clipboard operation
changelog copied to clipboard

Allow charset to be specified or default to UTF-8

Open khalwat opened this issue 5 years ago • 0 comments

I have a changelog.md that is encoded as UTF-8, and has "special" characters like smart quotes, etc. in it that end up getting munged by the Crawler (see below) because of the way the Parser is constructed.

Example:

Display “Select…” instead of “(unknown)” when no Main Entity of Page has been selected

Because you do this in Parser.php:

    public function setContent($value)
    {
        $converter = new CommonMarkConverter;
        $this->content = new Crawler($converter->convertToHtml($value));
    }

It calls the Crawler.php constructor:

    public function __construct($node = null, $currentUri = null, $baseHref = null)
    {
        $this->uri = $currentUri;
        $this->baseHref = $baseHref ?: $currentUri;

        $this->add($node);
    }

...which ends up calling $this->add() -- but unfortunately as discussed here:

https://symfony.com/doc/current/components/dom_crawler.html#adding-the-content

...if you call add() if no charset is found, it defaults to ISO-8859-1. Whereas if you just instantiated a new Crawler and then did $crawler->addHtmlContent() it would default to UTF-8, which I think is more desirable here?

khalwat avatar Jul 03 '20 18:07 khalwat