Ryan Barrett

Results 2014 comments of Ryan Barrett

Interesting! [Looking at the Bridgy log](https://brid.gy/log?module=default&start_time=1691653687&key=agdicmlkLWd5cmoLEg1QdWJsaXNoZWRQYWdlIkNodHRwczovL2pwLmNhcnVhbmEuZnIvbm90ZXMvMjAyMy8wOC8xMC9ub3Qtc3VyZS10aGlzLXBvc3QtaHR0cHMtanAvDAsSB1B1Ymxpc2gYgIC45qjyggkM), the mf2 parser ended up with two `content` values from your post, one plain text with the `…` HTML entity, one html + text:...

A few more data points: * Looking at https://jp.caruana.fr/notes/2023/08/10/not-sure-this-post-https-jp/ right now, it has the `…` Unicode character directly in the content, inline * I tried previewing a publish on https://brid.gy/mastodon/@[email protected],...

Oops, I was wrong. I looked at https://jp.caruana.fr/notes/2023/08/10/not-sure-this-post-https-jp/ in browser dev tools, and evidently that shows me the contents after decoding HTML entities. curl and Python requests both show that...

Looks like this is a BeautifulSoup thing. Bridgy and granary fetch posts with requests: https://github.com/snarfed/bridgy/blob/b26c8859c6db927a7a30fe1fa1a8f7d4123aa06b/webmention.py#L66-L71 ...then parse it manually with BeautifulSoup, then pass that to mf2py: https://github.com/snarfed/webutil/blob/63be8a763a618d43e957c6d414c0f6de8f298184/util.py#L1917-L1985 ```py if isinstance(input,...

Hmm, maybe it's an lxml thing? ```py >>> from bs4.diagnose import diagnose >>> diagnose(resp.text) Diagnostic running on Beautiful Soup 4.12.2 Python version 3.9.16 (main, Dec 7 2022, 10:06:04) [Clang 14.0.0...

Looks like maybe yes. ```py >>> print(bs4.BeautifulSoup(text, 'html.parser')) ... I haven’t posted anything longer than 500 chars since… I’ll give it another try. ... >>> print(bs4.BeautifulSoup(text, 'html5lib')) ... I haven’t...

I see `mldr` [in WHATWG's list of entities](https://html.spec.whatwg.org/multipage/named-characters.html) and in the [2011 HTML spec](https://www.w3.org/TR/2011/WD-html5-20110525/named-character-references.html). I don't see anything obvious searching lxml's bug tracker [for mldr](https://bugs.launchpad.net/lxml?field.searchtext=mldr) or [for entities](https://bugs.launchpad.net/lxml?field.searchtext=entities). So I...

Generally `e-content`, which preserves the inner HTML tags. Only use `p-content` if you want to collapse the value to plain text.