pharo icon indicating copy to clipboard operation
pharo copied to clipboard

findBetweenSubstrings: freezes the image

Open QDucasse opened this issue 4 years ago • 12 comments

Describe the bug Running the following piece of code freezes the image

'text' findBetweenSubstrings: #('te' 'x')

whereas

'text' findBetweenSubstrings: #('te' 't')
>>> a Collection('x')

Version information:

  • Version: Pharo9

QDucasse avatar Apr 16 '20 19:04 QDucasse

In addition, we should add tests.

Ducasse avatar Apr 18 '20 12:04 Ducasse

Q your expression does not freeze my image.

'text' findBetweenSubstrings: #('te' 'x')
>>> anOrderedCollection('t) 

which is bogus but not freezing :)

Ducasse avatar Apr 19 '20 19:04 Ducasse

I tried it again, it does not freeze my image indeed. I tried to propose a simpler example than the one I encountered the freeze with:

'	.text
	.file	"mod"
	.globl	sum
	.p2align	4, 0x90
	.type	sum,@function
sum:
	.cfi_startproc
	leal	(%rdi,%rsi), %eax
	retq
.Lfunc_end0:
	.size	sum, .Lfunc_end0-sum
	.cfi_endproc


	.section	".note.GNU-stack","",@progbits
'  findBetweenSubstrings: #('sum:','.Lfunc_end0:')

This one freezes a fresh Pharo9.0 image. The comma is the issue.

QDucasse avatar Apr 20 '20 09:04 QDucasse

Shorter example that freezes my pharo9 image: ',' findBetweenSubstrings: #(,'ab') No freeze: ',' findBetweenSubstrings: #(,'a'). No freeze: ',' findBetweenSubstrings: #(,).

dupriezt avatar Apr 24 '20 08:04 dupriezt

findBetweenSubstrings: delimiters
	"Answer the collection of String tokens that result from parsing self.  Tokens are separated by 'delimiters', which can be a collection of Strings, or a collection of Characters.  Several delimiters in a row are considered as just one separation."

	| tokens keyStart keyStop |
	tokens := OrderedCollection new.
	keyStop := 1.
	[keyStop <= self size] whileTrue:
		[keyStart := self skipAnySubstring: delimiters startingAt: keyStop.
		keyStop := self findAnySubstring: delimiters startingAt: keyStart.
		keyStart < keyStop
			ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
	^tokens

When doing ',' findBetweenSubstrings: #(,'ab') (freeze), the first call keyStart := self skipAnySubstring: delimiters startingAt: keyStop. returns 1. Then keystart and keystop stay at 1, and the while loop never ends. When doing ',' findBetweenSubstrings: #(,'a'). (no freeze), the first call keyStart := self skipAnySubstring: delimiters startingAt: keyStop. returns 2, which allows the while loop to stop.

dupriezt avatar Apr 24 '20 10:04 dupriezt

#skipAnySubstring:startingAt: says it "Answer the index of the last character within the receiver, starting at start, that does NOT match one of the delimiters. delimiters is a Array of substrings (Characters also allowed). If the receiver is all delimiters, answer size + 1."

So why does ',' skipAnySubstring: #(,'ab') startingAt:1. return 1? The receiver is all delimiters, so it should return the size of the string + 1 (2). For comparison, ',' skipAnySubstring: #(,'a') startingAt:1 returns 2 as expected.

dupriezt avatar Apr 24 '20 10:04 dupriezt

The issue happens when #skipAnySubstring:startingAt: is called with two substrings to skip: the first one matches the receiver and the second is bigger than the receiver.

When the first substring is looked at, the reading head (ii) gets set to what it was before + the size of the substring, because it matches, -1 by ii := ii + delim size - 1 So if the string is ',', and the first substring is also ',', then ii is set to 1+1-1=1. Normally, another iteration of the big while loop would start, ii would be incremented to 2, and the condition ((ii := ii + 1) <= self size) would be false, so self size + 1 (=2) would be returned (correct value. But if there's a second substring that's bigger than the string, that gets looked at after, any gets set to false by ifTrue: [any := false], and any ifFalse: [^ ii] happens before the end of the while loop iteration is reached. So ii is returned before having been incremented by the next check of the while loop condition, so the method returns 1 instead of 2.

dupriezt avatar Apr 24 '20 16:04 dupriezt

Another "feature" of the current algorithm is that it tries to skip the first delimiters that works it sees. This means the end result depends on the order of the elements in the delimiters array. For example, 'abcd' skipAnySubstring: #('ab' 'abc') startingAt:1 returns 3, because it skips 'ab', but 'abcd' skipAnySubstring: #('abc' 'ab') startingAt:1 returns 4, because it skips 'abc'.

I think that this algorithm should not depend on the order in which the delimiters are provided, and instead have a consistent behaviour like "always skip the largest possible delimiter".

dupriezt avatar Apr 24 '20 16:04 dupriezt

Another similar case which hangs the Pharo 9 image:

'()' findBetweenSubstrings: #('(' ')))').

hernanmd avatar Sep 05 '21 21:09 hernanmd

'()' findBetweenSubstrings: #('(' ')))').

#9277 also solves this in the Pharo9.0 branch; the bug is still there in the Pharo10.0 branch though.

marcor avatar Nov 08 '21 08:11 marcor

#9277 also solves this in the Pharo9.0 branch; the bug is still there in the Pharo10.0 branch though.

@tesonep could you merge this again in the development branch please? Then the bug can be closed I believe.

marcor avatar Sep 13 '22 21:09 marcor

We need to do it for both Pharo 10 and Pharo 11.

MarcusDenker avatar Sep 15 '22 10:09 MarcusDenker

This now has been forward ported to both Pharo10 and Pharo11, so we can close the issue

MarcusDenker avatar Oct 24 '22 14:10 MarcusDenker

Yay!

marcor avatar Oct 24 '22 18:10 marcor