pharo icon indicating copy to clipboard operation
pharo copied to clipboard

Symbol>>#intern: of WideString: do we need the #isOctetString check?

Open MarcusDenker opened this issue 2 years ago • 4 comments

Describe the problem Symbol>>#intern: for WideString check if the WideString could be represented as a ByteString (using #isOctetString).

This is not nice as it means iterating any WideString for the check, slowing down #asSymbol.

Do we need to do that check?

I copy over the discussion from https://github.com/pharo-project/pharo/pull/11291

MarcusDenker avatar May 30 '22 10:05 MarcusDenker

@svenvc:

I looked at the Symbol interning code before, I found it confusing and a bit of a mess, it looks to be more complex than it needs to be, probably again due to historical reasons.

For this it is very good that you try to improve it.

Regarding the isOctetString I think the reason might be to make sure that the following is true:

'ABC' asSymbol == 'ABC' asWideString asSymbol.

Since Symbols are tested with #==, two different subclass instances, ByteSymbol and WideSymbol can never be identical, while the following is true due to the way #= is implemented

'ABC' = 'ABC' asWideString.

The question is how often will such a strange case occur (i.e. a Latin1 string that becomes a WideSymbol) and is it necessary to pay the (heavy) price for this at symbol interning time ?

MarcusDenker avatar May 30 '22 10:05 MarcusDenker

@MarcusDenker

But:

'ABC'  = 'ABC' asWideString. "true"

Thus, the search for an existing symbol (using #like on the WeakSet) will find the symbol regardless if it looks for a wide string or not.

if I change the (current) #intern: to use

	aClass := aStringOrSymbol isWideString ifTrue:[WideSymbol] ifFalse:[ByteSymbol].

I get:

'ABC' asSymbol == 'ABC' asWideString asSymbol "true"
 'ABC' asWideString asSymbol == 'ABC' asSymbol  "true"

But, now asking for the symbol after having asked for the equal wide symbol returns the wide symbol (eval in one doit due to to GC):

'ABCCC' asWideString asSymbol class. 
'ABCCC'  asSymbol class "WideSymbol"

But to me this is not a problem, as the only thing that matters is identiy.

MarcusDenker avatar May 30 '22 10:05 MarcusDenker

@svenvc

I think we can try the change. I would add a specific test case and maybe a comment.

MarcusDenker avatar May 30 '22 10:05 MarcusDenker

After merging https://github.com/pharo-project/pharo/pull/11291, the method where the check is done is WideString >> createSymbol

MarcusDenker avatar May 30 '22 10:05 MarcusDenker