Extra whitespace on Mastodon
I had an issue where Mastodon would show extra whitespace that I think is due to the block editor and/or the use of wpautop().
The block editor already adds paragraphs, and it also adds a bunch of newlines that I think wpautop() then converts to p or br elements and are thus not stripped by preg_replace( '/[\n\r\t]/', '', $content ).
What seems to work is:
- a filter where I don't call
wpautop()(okay for me, I'm using the block editor for all post types that I want to federate) - call
wpautop( $content, false )instead, so with the second parameter set to false
Second option seems the better one? Or is there a reason to not do this?
(Also, isn't wpautop one of the filters in apply_filters( 'the_content', $post->post_content ) already?)
It depends: https://github.com/Automattic/wordpress-activitypub/blob/master/includes/class-shortcodes.php#L201 (This is because a lot of plugins add a lot of %&$§ to the content, so it is up to the user if he wants that /%&§?§$ to be federated or only his content).
I added wpautop here https://github.com/Automattic/wordpress-activitypub/blob/master/includes/model/class-post.php#L520, because you could add extra content to the ActivityPub content field, like the hashtags, links, ... an this should also be formatted.
As far as I know it wpautop does take care of already <p>ed content?!?
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days.
I don't seem to be suffering from this anymore; also can't really remember what exactly may have lead to this. But I'm also completely overwriting "ActivityPub" posts' contents in an add-on plugin. :-)
I just noticed this again. https://indieweb.social/@[email protected]/111671414664906704 is a Mastodon copy of the post over at https://starrwulfe.xyz/notes/2023/12/ad18663ee2/. And while Mastodon's web UI shows it just fine, Tusky on Android adds a bunch of newlines that shouldn't be there.
Now, this may be a Tusky issue, but it can be prevented by stripping whitespace in between HTML tags. If I load the indieweb.social URL above and look for the JSON coming from the server, it kinda looks like this:
<p><i>In reply to <a class="u-url" href="https://jan.boddez.net/notes/9a5f268a52" rel="nofollow noopener noreferrer" target="_blank">https://jan.boddez.net/notes/9a5f268a52</a> by <span class="p-author">Jan</span>.</i></p> <p><a rel="nofollow noopener noreferrer" class="u-url mention" href="https://jan.boddez.net/author/jan" target="_blank">@<span>jan</span></a> <a rel="nofollow noopener noreferrer" class="u-url mention" href="https://indieweb.social/@janboddez" target="_blank">@<span>janboddez</span></a> so hang on a sec; how do I do this on my own WP blog here? What do I “aim” the <code>activitypub_in_reply_to</code> field at to get the “inline federated reply” trick to work? </p><p><em>(side note; how do I put the shortlink next to the date as the permalinks like how you’ve done?)</em></p> <p><a href="https://starrwulfe.xyz/notes/2023/12/ad18663ee2/" rel="nofollow noopener noreferrer" target="_blank">https://starrwulfe.xyz/notes/2023/12/ad18663ee2/</a></p>
There's a literal couple spaces at the start, and then more spaces between the closing </p> tag and the opening tag that follows it, and so on. Which cause Tusky to behave weird.
The reason I don't suffer from it on my site is because I have a custom activitypub_the_content filter in place.
Oh, wait. I just checked out my custom filter and it literally only does the following (I only use it so I can use a "template" that's different from what's defined in the settings):
$content = apply_filters( 'the_content', $post->post_content );
$content .= '<p><a href="' . esc_url( get_permalink( $post ) ) . '">' . esc_html( get_permalink( $post ) ) . '</a></p>';
$content = wp_kses( $content, $allowed_tags );
$content = preg_replace( '~[\n\r\t]~', '', $content );
$content = trim( $content );
It shouldn't do anything that isn't already done by the AP plugin: https://github.com/Automattic/wordpress-activitypub/blob/245cda8433b8319856cf3a656f5bd975ee383cfc/includes/transformer/class-post.php#L601-L605
Maybe @StarrWulfe is using a custom filter too? That somehow adds these spaces?
I'm seeing extra whitespace in comments now, too. I think because they don't get run through my custom filter. Note that it doesn't show in Mastodon's web UI, but does in Tusky. (You can still see them in the JSON object though.)
Would it be possible/useful/logical to also run comment content through activitypub_the_content?
E.g., see the \ns in this post's JSON:
"content":"\u003cp\u003eRelevant:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eI think it is highly context dependent. For example, I like them in my digital garden because that is where I’m saving links. I don’t like them when I’m linking to things in regular blog posts.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://cagrimmett.com/likes/308a059a5e/\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"\u003ehttps://cagrimmett.com/likes/308a059a5e/\u003c/a\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n"
I would have to add a couple debug statements to see what exactly gets sent to servers and where the extra whitespace is added/not stripped.
The literal comment "source" in WP Admin/the database is this:
Relevant:
<blockquote>I think it is highly context dependent. For example, I like them in my digital garden because that is where I’m saving links. I don’t like them when I’m linking to things in regular blog posts.
<cite>https://cagrimmett.com/likes/308a059a5e/</cite>
</blockquote>
I expect wpautop() to add <p>s around the first line and around both "paragraphs" inside the blockquote. I don't really expect it to add a newline in between the opening blockquote and opening p tags, but it seems like maybe it does.
I also think in https://github.com/Automattic/wordpress-activitypub/blob/e23c65f2959b189c73f3dd7485410c3864c06019/includes/transformer/class-comment.php#L113 the the_content filter isn't called 100% correctly; the second param is supposed to be a boolean. Don't think it really matters for the outcome, though.
I also think the filter to use on comments is normally a different one: comment_text.
But what I think might be happening is wpautop() (or a similar filter) is run a second time as part of the default the_content filters and reintroduces these newline characters. I'll test a bit more when I have time.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days.