perl5 hopping chars now consumes 1 hop count no matter the starting position nor direction

Previously, that was the case for backwards hops, but if a forward hop started at a continuation byte, each such byte in the current character consumed one hop count.

Jul 10 '22 16:07 khwilliamson

Should subject of pull request be hopping rather than hoppiing?

Jul 10 '22 18:07 jkeenan

If I understand, won't this make:

utf8_hop_forward(p, 2, pend)

produce a different result from:

utf8_hop_forward(utf8_hop_forward(p, 1, pend), 1, pend)

?

Jul 11 '22 00:07 tonycoz

On 7/10/22 18:55, Tony Cook wrote:

If I understand, won't this make:

|utf8_hop_forward(p, 2, pend) |

produce a different result from:

|utf8_hop_forward(utf8_hop_forward(p, 1, pend), 1, pend) |

?

No. There is no change from current behavior if the starting position is at a non-continuation.

if you have two characters, let's say the first is two bytes; the other is three. The forward by 2 will move you five bytes. The first forward by 1 will move you two bytes to the beginning of the second character; the second forward by 1 will move you an additional 3.

Jul 11 '22 01:07 khwilliamson

I was thinking for an invalid string, eg. INVARIANT CONT CONT

For the single hop 2 it skips the invariant, then the first CONT, since UTF8SKIP() for a CONT is 1.

For the 2 x hop 1 it skips the invariant on the first call, then both CONTs on the second call.

Ideally of course, we wouldn't get an invalid string, but these functions are intended to at least be safe on invalid strings.

An alternative would be to throw an exception for invalid strings, including if s starts on a continuation, but we have plenty of other functions that could be used for validation before calling utf8_hop_*().

Jul 13 '22 00:07 tonycoz

The only way to avoid surprises is to always check for complete well-formedness.

But my assertion is that the prior behavior is insane for well-formed UTF-8. If you call it in the middle of a character, each continuation byte will count as a full character. The new method would automatically synchronize for you.

I'd rather have insane behavior on illegal input, and sane on legal

Jul 13 '22 04:07 khwilliamson

Do we ever call these functions where s isn't one of: a) start, b) end c) the result of calling these functions on any of a, b, or c?

I'd tend to think that s being some random pointer in the string being an error in itself, but I don't recall all the circumstances we were calling these and the older functions they were intended to replace.

Jul 13 '22 06:07 tonycoz

There are no calls outside of APItest to hopping forward where the pointer isn't at the beginning. However there are several places in the code that do want to hop from the middle to the beginning of the next character, and they rtoll-their-own code to do that.

I believe there are calls to hop back that don't start at the end or start.

The proposed interface would bring hop forward into parity with hop backward

Jul 16 '22 14:07 khwilliamson

perl5 perl5 copied to clipboard

hopping chars now consumes 1 hop count no matter the starting position nor direction

perl5
perl5 copied to clipboard