changelog
changelog copied to clipboard
Allow charset to be specified or default to UTF-8
I have a changelog.md that is encoded as UTF-8, and has "special" characters like smart quotes, etc. in it that end up getting munged by the Crawler (see below) because of the way the Parser is constructed.
Example:
Display âSelectâ¦â instead of â(unknown)â when no Main Entity of Page has been selected
Because you do this in Parser.php:
public function setContent($value)
{
$converter = new CommonMarkConverter;
$this->content = new Crawler($converter->convertToHtml($value));
}
It calls the Crawler.php constructor:
public function __construct($node = null, $currentUri = null, $baseHref = null)
{
$this->uri = $currentUri;
$this->baseHref = $baseHref ?: $currentUri;
$this->add($node);
}
...which ends up calling $this->add() -- but unfortunately as discussed here:
https://symfony.com/doc/current/components/dom_crawler.html#adding-the-content
...if you call add() if no charset is found, it defaults to ISO-8859-1. Whereas if you just instantiated a new Crawler and then did $crawler->addHtmlContent() it would default to UTF-8, which I think is more desirable here?