pharo
pharo copied to clipboard
findBetweenSubstrings: freezes the image
Describe the bug Running the following piece of code freezes the image
'text' findBetweenSubstrings: #('te' 'x')
whereas
'text' findBetweenSubstrings: #('te' 't')
>>> a Collection('x')
Version information:
- Version: Pharo9
In addition, we should add tests.
Q your expression does not freeze my image.
'text' findBetweenSubstrings: #('te' 'x')
>>> anOrderedCollection('t)
which is bogus but not freezing :)
I tried it again, it does not freeze my image indeed. I tried to propose a simpler example than the one I encountered the freeze with:
' .text
.file "mod"
.globl sum
.p2align 4, 0x90
.type sum,@function
sum:
.cfi_startproc
leal (%rdi,%rsi), %eax
retq
.Lfunc_end0:
.size sum, .Lfunc_end0-sum
.cfi_endproc
.section ".note.GNU-stack","",@progbits
' findBetweenSubstrings: #('sum:','.Lfunc_end0:')
This one freezes a fresh Pharo9.0 image. The comma is the issue.
Shorter example that freezes my pharo9 image: ',' findBetweenSubstrings: #(,'ab')
No freeze: ',' findBetweenSubstrings: #(,'a').
No freeze: ',' findBetweenSubstrings: #(,).
findBetweenSubstrings: delimiters
"Answer the collection of String tokens that result from parsing self. Tokens are separated by 'delimiters', which can be a collection of Strings, or a collection of Characters. Several delimiters in a row are considered as just one separation."
| tokens keyStart keyStop |
tokens := OrderedCollection new.
keyStop := 1.
[keyStop <= self size] whileTrue:
[keyStart := self skipAnySubstring: delimiters startingAt: keyStop.
keyStop := self findAnySubstring: delimiters startingAt: keyStart.
keyStart < keyStop
ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
^tokens
When doing ',' findBetweenSubstrings: #(,'ab')
(freeze), the first call keyStart := self skipAnySubstring: delimiters startingAt: keyStop.
returns 1. Then keystart and keystop stay at 1, and the while loop never ends.
When doing ',' findBetweenSubstrings: #(,'a').
(no freeze), the first call keyStart := self skipAnySubstring: delimiters startingAt: keyStop.
returns 2, which allows the while loop to stop.
#skipAnySubstring:startingAt: says it "Answer the index of the last character within the receiver, starting at start, that does NOT match one of the delimiters. delimiters is a Array of substrings (Characters also allowed). If the receiver is all delimiters, answer size + 1."
So why does ',' skipAnySubstring: #(,'ab') startingAt:1.
return 1? The receiver is all delimiters, so it should return the size of the string + 1 (2).
For comparison, ',' skipAnySubstring: #(,'a') startingAt:1
returns 2 as expected.
The issue happens when #skipAnySubstring:startingAt: is called with two substrings to skip: the first one matches the receiver and the second is bigger than the receiver.
When the first substring is looked at, the reading head (ii) gets set to what it was before + the size of the substring, because it matches, -1 by ii := ii + delim size - 1
So if the string is ',', and the first substring is also ',', then ii is set to 1+1-1=1.
Normally, another iteration of the big while loop would start, ii would be incremented to 2, and the condition ((ii := ii + 1) <= self size
) would be false, so self size + 1
(=2) would be returned (correct value.
But if there's a second substring that's bigger than the string, that gets looked at after, any
gets set to false by ifTrue: [any := false]
, and any ifFalse: [^ ii]
happens before the end of the while loop iteration is reached. So ii
is returned before having been incremented by the next check of the while loop condition, so the method returns 1 instead of 2.
Another "feature" of the current algorithm is that it tries to skip the first delimiters that works it sees.
This means the end result depends on the order of the elements in the delimiters array.
For example, 'abcd' skipAnySubstring: #('ab' 'abc') startingAt:1
returns 3, because it skips 'ab', but 'abcd' skipAnySubstring: #('abc' 'ab') startingAt:1
returns 4, because it skips 'abc'.
I think that this algorithm should not depend on the order in which the delimiters are provided, and instead have a consistent behaviour like "always skip the largest possible delimiter".
Another similar case which hangs the Pharo 9 image:
'()' findBetweenSubstrings: #('(' ')))').
'()' findBetweenSubstrings: #('(' ')))').
#9277 also solves this in the Pharo9.0 branch; the bug is still there in the Pharo10.0 branch though.
#9277 also solves this in the Pharo9.0 branch; the bug is still there in the Pharo10.0 branch though.
@tesonep could you merge this again in the development branch please? Then the bug can be closed I believe.
We need to do it for both Pharo 10 and Pharo 11.
This now has been forward ported to both Pharo10 and Pharo11, so we can close the issue
Yay!