big-list-of-naughty-strings icon indicating copy to clipboard operation
big-list-of-naughty-strings copied to clipboard

IDN characters

Open rquadling opened this issue 1 year ago • 7 comments

Would these be a suitable thing to document here?

For example, where do you think this link will take you? http://accounts.googlе.com

Sure. It LOOKS like it'll take you somewhere obvious, but it's not that at all. Hopefully it doesn't ACTUALLy take you anywhere!

All I know ... I'm not clicking it!

rquadling avatar Aug 18 '22 17:08 rquadling

I clicked the link and nothing happened. tumblr_ml01gvUPCG1r18fjgo1_500-2292880854

geeknik avatar Aug 18 '22 19:08 geeknik

If you look at the actual link, it looks like this: Screenshot_20220818-175854_Firefox

bbbco avatar Aug 18 '22 21:08 bbbco

I think this is actually referring to Punycode

bbbco avatar Aug 18 '22 22:08 bbbco

@rquadling what does it look like without markdown? I also see http://accounts.xn--googl-3we.com/.

ross-spencer avatar Aug 19 '22 06:08 ross-spencer

IDN / Punycode ... is related ... one is the representation of the other.

So, the IDN allows for Unicode characters. But these characters (and I think they are only English vowels ... maybe not though) look like other letters. So if someone has created a server for the fake URL that then does adds the naughty payload in whatever way it wants and sends you a mocked up back (man in the middle sort of thing).

The URL will show be shown as Punycode. Well. It does in Chrome. Will it in all browsers? Or anything that displays the URL? It's not in links (but is in mouseovers) ...

So that's why I feel IDNs should be considered for the list of naughty strings.

rquadling avatar Sep 06 '22 17:09 rquadling

Maybe pick one string for each class of problems mentioned in Lord.io's Identity Beyond Usernames?

Byte-wise, the real "epic.com" and the false website "еріс.com" are completely different. But visually, they're indistinguishable from each other in the URL bar, allowing phishing problems to run amock. Unicode canonicalization and normalization can help with certain cases of this problem, but does nothing for our epic.com example.

This particular example isn't visible in Chrome, which instead shows https://xn--e1awd7f.com/, the "punycode" representation of the domain name. This is thanks to Chrome's complex, 13 step process for detecting if a domain name is likely to be a Unicode phish or not. "Well, it may be complex," you tell me, "but at least it solves the phishing problem!" Unfortunately it does not.

Specific instances of IDN homograph attacks have been reported to Chrome, and we continually update our IDN policy to prevent against these attacks.

The Unicode spec is apparently too large to solve this problem 100% perfectly, and so their "solution" is to pay $2000 to anybody who finds new edge cases. This also doesn't actually solve the problem for non-Latin alphabets — if for example, I own a Chinese domain name, it will never show punycode, and attackers can phish my site using duplicate encodings for those Chinese characters. Chrome just attempts to solve the much smaller problem of the numerous Unicode characters that visually look like the Latin alphabet.

That is:

  1. One bad Punycode domain name that'd rely purely on canonicalization and normalization to be caught.
  2. At least one bad Punycode domain name that is disallowed by Chrome's process, but likely to be allowed through by other tools. (Possibly one for each step in Chrome's 13-step process which is of the form "If X, then bail out". For example, "If two or more numbering systems (e.g. European digits + Bengali digits) are mixed, show punycode.")
  3. One bad Punycode domain name that is allowed through by everything for testing protections that are geared toward making uncaught stuff more likely to be recognized as suspicious by the human in the chair.

ssokolow avatar Sep 06 '22 18:09 ssokolow

I don't really know what to do here. But in terms of "naughty strings" ... I'm hoping the conversation is interesting enough to add something to the list of "naughty strings" in some way.

rquadling avatar Sep 07 '22 13:09 rquadling