wordpress-activitypub icon indicating copy to clipboard operation
wordpress-activitypub copied to clipboard

Extra whitespace on Mastodon

Open janboddez opened this issue 2 years ago • 8 comments

I had an issue where Mastodon would show extra whitespace that I think is due to the block editor and/or the use of wpautop().

The block editor already adds paragraphs, and it also adds a bunch of newlines that I think wpautop() then converts to p or br elements and are thus not stripped by preg_replace( '/[\n\r\t]/', '', $content ).

What seems to work is:

  • a filter where I don't call wpautop() (okay for me, I'm using the block editor for all post types that I want to federate)
  • call wpautop( $content, false ) instead, so with the second parameter set to false

Second option seems the better one? Or is there a reason to not do this?

janboddez avatar Jul 10 '23 18:07 janboddez

(Also, isn't wpautop one of the filters in apply_filters( 'the_content', $post->post_content ) already?)

janboddez avatar Jul 10 '23 21:07 janboddez

It depends: https://github.com/Automattic/wordpress-activitypub/blob/master/includes/class-shortcodes.php#L201 (This is because a lot of plugins add a lot of %&$§ to the content, so it is up to the user if he wants that /%&§?§$ to be federated or only his content).

I added wpautop here https://github.com/Automattic/wordpress-activitypub/blob/master/includes/model/class-post.php#L520, because you could add extra content to the ActivityPub content field, like the hashtags, links, ... an this should also be formatted.

As far as I know it wpautop does take care of already <p>ed content?!?

pfefferle avatar Jul 11 '23 05:07 pfefferle

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 09 '23 01:11 github-actions[bot]

I don't seem to be suffering from this anymore; also can't really remember what exactly may have lead to this. But I'm also completely overwriting "ActivityPub" posts' contents in an add-on plugin. :-)

janboddez avatar Nov 09 '23 13:11 janboddez

I just noticed this again. https://indieweb.social/@[email protected]/111671414664906704 is a Mastodon copy of the post over at https://starrwulfe.xyz/notes/2023/12/ad18663ee2/. And while Mastodon's web UI shows it just fine, Tusky on Android adds a bunch of newlines that shouldn't be there.

Now, this may be a Tusky issue, but it can be prevented by stripping whitespace in between HTML tags. If I load the indieweb.social URL above and look for the JSON coming from the server, it kinda looks like this:

  <p><i>In reply to <a class="u-url" href="https://jan.boddez.net/notes/9a5f268a52" rel="nofollow noopener noreferrer" target="_blank">https://jan.boddez.net/notes/9a5f268a52</a> by <span class="p-author">Jan</span>.</i></p>  <p><a rel="nofollow noopener noreferrer" class="u-url mention" href="https://jan.boddez.net/author/jan" target="_blank">@<span>jan</span></a> <a rel="nofollow noopener noreferrer" class="u-url mention" href="https://indieweb.social/@janboddez" target="_blank">@<span>janboddez</span></a> so hang on a sec; how do I do this on my own WP blog here? What do I “aim” the <code>activitypub_in_reply_to</code> field at to get the “inline federated reply” trick to work? </p><p><em>(side note; how do I put the shortlink next to the date as the permalinks like how you’ve done?)</em></p> <p><a href="https://starrwulfe.xyz/notes/2023/12/ad18663ee2/" rel="nofollow noopener noreferrer" target="_blank">https://starrwulfe.xyz/notes/2023/12/ad18663ee2/</a></p>

There's a literal couple spaces at the start, and then more spaces between the closing </p> tag and the opening tag that follows it, and so on. Which cause Tusky to behave weird.

The reason I don't suffer from it on my site is because I have a custom activitypub_the_content filter in place.

janboddez avatar Dec 31 '23 14:12 janboddez

Oh, wait. I just checked out my custom filter and it literally only does the following (I only use it so I can use a "template" that's different from what's defined in the settings):

$content  = apply_filters( 'the_content', $post->post_content );
$content .= '<p><a href="' . esc_url( get_permalink( $post ) ) . '">' . esc_html( get_permalink( $post ) ) . '</a></p>';
$content = wp_kses( $content, $allowed_tags );
$content = preg_replace( '~[\n\r\t]~', '', $content );
$content = trim( $content );

It shouldn't do anything that isn't already done by the AP plugin: https://github.com/Automattic/wordpress-activitypub/blob/245cda8433b8319856cf3a656f5bd975ee383cfc/includes/transformer/class-post.php#L601-L605

Maybe @StarrWulfe is using a custom filter too? That somehow adds these spaces?

janboddez avatar Dec 31 '23 14:12 janboddez

I'm seeing extra whitespace in comments now, too. I think because they don't get run through my custom filter. Note that it doesn't show in Mastodon's web UI, but does in Tusky. (You can still see them in the JSON object though.)

Would it be possible/useful/logical to also run comment content through activitypub_the_content?

E.g., see the \ns in this post's JSON:

"content":"\u003cp\u003eRelevant:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eI think it is highly context dependent. For example, I like them in my digital garden because that is where I’m saving links. I don’t like them when I’m linking to things in regular blog posts.\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://cagrimmett.com/likes/308a059a5e/\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"\u003ehttps://cagrimmett.com/likes/308a059a5e/\u003c/a\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n"

I would have to add a couple debug statements to see what exactly gets sent to servers and where the extra whitespace is added/not stripped.

The literal comment "source" in WP Admin/the database is this:

Relevant:

<blockquote>I think it is highly context dependent. For example, I like them in my digital garden because that is where I’m saving links. I don’t like them when I’m linking to things in regular blog posts.

<cite>https://cagrimmett.com/likes/308a059a5e/</cite>
</blockquote>

I expect wpautop() to add <p>s around the first line and around both "paragraphs" inside the blockquote. I don't really expect it to add a newline in between the opening blockquote and opening p tags, but it seems like maybe it does.

I also think in https://github.com/Automattic/wordpress-activitypub/blob/e23c65f2959b189c73f3dd7485410c3864c06019/includes/transformer/class-comment.php#L113 the the_content filter isn't called 100% correctly; the second param is supposed to be a boolean. Don't think it really matters for the outcome, though.

I also think the filter to use on comments is normally a different one: comment_text.

But what I think might be happening is wpautop() (or a similar filter) is run a second time as part of the default the_content filters and reintroduces these newline characters. I'll test a bit more when I have time.

janboddez avatar Jan 12 '24 09:01 janboddez

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 12 '24 01:05 github-actions[bot]