codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Add new state: Unicode compatibility normalization

Open Sim4n6 opened this issue 5 months ago • 2 comments
trafficstars

Hey ,

I noticed that you are considering only two states:

  1. One regarding the path normalization if it is done or not before the safe check
  2. Second concerns the safe check.

as shown next:

https://github.com/github/codeql/blob/c1c0a705b9f14c0f577a9ae56a9d699e8b6e67d6/python/ql/lib/semmle/python/security/dataflow/PathInjectionQuery.qll#L20-L28

However, there is a third state that is a required one: Unicode normalized. If ever a Unicode normalization is performed with a compatibility algorithm (NFKC or NFKD), the query would miss some cases precisely those ones where the Unicode normalization is not performed before the path normalization and the safe check. I draw a little chart to depict my saying:

Image

The previous chart shows that when you consider a potential Unicode compatibility normalization, it is a required step before path normalization and safe check. If ever placed between the first two steps or after the last one, that would yield a vulnerable case that got missed due to the fact that the Unicode normalization may reintroduce unexpected special characters such as .. and /.

Regards @Sim4n6

Sim4n6 avatar Jun 09 '25 20:06 Sim4n6