wtfpython icon indicating copy to clipboard operation
wtfpython copied to clipboard

Unicode ligatures in variable names (suggestion)

Open IamMusavaRibica opened this issue 3 years ago • 3 comments

▶ Weird unicode ligatures behaviour

Python interpreter interprets unicode ligatures as two or more characters they are made of. If we explicitly declare variables using globals() it doesn't switch. This example uses fi ligature https://en.wikipedia.org/wiki/Ligature_(writing)#Latin_alphabet

Output (python 3.9):

>>> fig = 4
>>> fig
4
>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'fig': 4}
>>> globals()['fig'] = 6
>>> fig
4
>>> globals()['fig']
6

Variable named fig with ligature fi gets saved with both the name of the ligature and "normal" letters. We can also use ligatures on previously "normally" defined variables:

>>> fis = 9  # normal f i s
>>> fis  # ligature
9

💡 Explanation:

  • Unfortunately I cannot explain as I came up with this on my own. Probably am not the first.

IamMusavaRibica avatar Jan 08 '22 22:01 IamMusavaRibica

@soft9000 Not sure if I understand your comment correctly. @IamMusavaRibica point is, that this behaviour you describe does not work if the variable contains ligatures.

I can confirm the behaviour on Python 3.8. Maybe report this as bug at https://bugs.python.org/ instead to check if it's intended behaviour?

sebix avatar Jan 09 '22 10:01 sebix

Tried it on 3.9.7. Worked as expected:

foo = 6

globals()['foo']

6

globals()['foo'] = 7

foo

7

He used ligature (check this link for more details https://graphemica.com/fi ) instead of normal fi. Basically, they don't have the same characters, so the result must be different.

AchBachir avatar Jan 09 '22 10:01 AchBachir

Tried it on 3.9.7. Worked as expected:

foo = 6

globals()['foo']

6

globals()['foo'] = 7

foo

7

He used ligature (check this link for more details https://graphemica.com/fi ) instead of normal fi. Basically, they don't have the same characters, so the result must be different.

Exactly. My guess is that interpreter takes it as two separate characters. Which is weird because len() of a ligature is 1. If globals are modified directly then the ligature is retained

Edit: Basically dealing with variables changes ligatures into their two respective letters, so even if fia is saved into globals as one value and fia as a separate, typing fia (ligature) directly into IDLE returns the value of non-ligature variable fia

IamMusavaRibica avatar Jan 09 '22 12:01 IamMusavaRibica