lancer
lancer copied to clipboard
The unicode normalization step of the python interpreter can be abused
Basically the suggesion in this reddit comment
From this article:
Python always applies NFKC normalization to characters. Therefore, two distinct characters may actually produce the same variable name. For example:
>>> ª = 1 # FEMININE ORDINAL INDICATOR >>> a # LATIN SMALL LETTER A (i.e., ASCII lowercase 'a') 1
I've generated a mapping of these characters taken from this url.
The mapping can be found here. But beware that some characters may not be supported in python because I haven't tested every one of them.
I suggest adding another additional flag to enable this behaviour
I would have done it myself and opened a pr but I am too busy at the moment
That sounds very promising! I like it. I am not sure if I find the time to implement it, but I am open for PRs.
I actually implemented this in uglier
, which was pretty much a copy of this project. In addition to abusing the Unicode normalization, it also uses cyrillic characters (which look a lot like latin chars) to make all variables look like they have the same identifier.
This:
def add_values(n1, n2):
return n1 + n2
def add_10_to_string(n):
return str(add_values(int(n), 10))
num = add_10_to_string("10")
print(num)
turns to:
def ADDVALUES(хxxх, хxхх):
return хxxх + хxхх
def ADDTOSTRING(НННН):
return st𝓇(𝕬𝔇𝔇𝔙𝕬𝕷𝓤𝔈𝔖(𝒾𝕟𝑡(НННН), 10))
НННH = 𝕬𝕯𝕯𝕿𝕺𝔖𝕿𝕽𝕴𝕹𝔊('10')
𝓅𝓇𝒾𝕟𝑡(НННH)
(notice it also abuses the normalization for built-ins, using something like 𝒾𝕟𝑡
for the built-in int
function)