kirby icon indicating copy to clipboard operation
kirby copied to clipboard

Extra spaces added after linked text using excerpt()

Open splorp opened this issue 2 years ago • 5 comments

Description

There’s a bug when links are stripped out of blocks of text using the excerpt() method.

If a link is followed by a punctuation mark (such as a period or comma) and that link gets stripped from a block of text, an extra space gets inserted between the original linked text and the punctuation. If the link is not followed directly by any punctuation, no extra space is inserted.

Note that this behaviour does not occur in K2.

To reproduce

I’m using the following code in my template, links in the text field are formatted using Markdown.

<?php echo $result->text()->excerpt(200) ?>

However, the same issue can be reproduced using:

<?php echo Str::excerpt('Why not <a href="https://getkirby.com/">Get Kirby</a>?', 200) ?>

This example returns:

Why not Get Kirby ?

Expected behavior
When links are stripped out of text using the excerpt() method, extra spaces should not be added between the link text and a punctuation mark directly following the link.

The expected text being returned should be:

Why not Get Kirby?

Screenshots
In the example below, the words “cursor”, “input”. and “tapped” were all linked in the the original text field.

Screen Shot 2022-05-04 at 2 37 43 PM

Your setup

Kirby Version
3.6.5

Console output
n/a

Your system (please complete the following information)

  • Device: MacBook Pro
  • OS: macOS 10.13.6 High Sierra
  • Browser: Vivaldi
  • Version: 5.2.2623.41 (Chrome/100.0.4896.147)

splorp avatar May 05 '22 20:05 splorp

Should add that this has nothing to do with links in particular, but HTML tags in general.

This happens because the Str::excerpt() method add a space before any < on line #399:

$string = strip_tags(str_replace('<', ' <', $string));

texnixe avatar May 05 '22 21:05 texnixe

I noticed that Str::excerpt() also removes double spaces (line 406), which explains why any HTML elements that are found inside the text block (such as inline links) are not affected by the str_replace above.

// remove double spaces
$string = preg_replace('![ ]{2,}!', ' ', $string);

splorp avatar May 05 '22 22:05 splorp

I traced the change back to a commit by Bastian on 31.07.2018 that added the initial implementation of Str::excerpt() together with tests. This implementation is mostly unchanged today.

I believe the str_replace('<', ' <', $string) change was made to make this test pass:

https://github.com/getkirby/kirby/blob/005fc78fcc0e7f99f61e6973efa8ea8ec85741d9/tests/Toolkit/StrTest.php#L214-L221

Possible fixes

More clever behavior

We could look into which situations really need this added space and only add the space for those in particular. So far I can think of the following:

  • <br> tags (which itself is whitespace without using whitespace in the code)
  • Open tags of block-level elements (e.g. a new <p>)

Disadvantage: Pretty complex to implement.

Only add the space before the open tag

If we don't add spaces before closing tags, the particular example in this issue would be solved. But this needs more tests to ensure that it doesn't break anything else.

lukasbestle avatar May 07 '22 11:05 lukasbestle

What are the chances of this getting addressed in the next couple of releases?

I can certainly hack something together for an interim fix, but I’d rather use something that’s built into the core.

splorp avatar Jun 17 '22 15:06 splorp

With the major release 3.7 upcoming, we expect that there will be regression fixes we will need to make. To be honest this issue is of comparatively lower priority, so we cannot yet estimate when we will get to it.

You can override the excerpt() method with your own implementation. Maybe my comment above helps to find a good solution. If you have one, we are very happy to accept a pull request that fixes it in the core.

lukasbestle avatar Jun 19 '22 20:06 lukasbestle

Just checking in … any chance of this issue being addressed in the 3.8 development cycle?

splorp avatar Sep 04 '22 19:09 splorp

I can't promise it yet. We have quite a lot to do for 3.8 still. As I wrote, a pull request for this is welcome.

lukasbestle avatar Sep 04 '22 19:09 lukasbestle

I can't promise it yet. We have quite a lot to do for 3.8 still.

I imagine so!

As I wrote, a pull request for this is welcome.

Well, I’ll see what I can do without mucking things up too much.

splorp avatar Sep 04 '22 21:09 splorp

Commenting to say that I've run into this issue too. I defined $page->text() as a set of blocks, and when I go to excerpt the page content for meta/SEO tags, I'll see situations like this:

image

To work around it, I've passed specific values through this function. I'm not sure if it's suitable for general use, but was a decent workaround in my case.

function trimExtraSpaces($str)
{
  return str_replace(" ,", ",", str_replace(" .", ".", $str));
}

rosszurowski avatar Sep 15 '22 15:09 rosszurowski

distantnative avatar Sep 15 '22 20:09 distantnative