codeql
codeql copied to clipboard
Add new state: Unicode compatibility normalization
Hey ,
I noticed that you are considering only two states:
- One regarding the path normalization if it is done or not before the safe check
- Second concerns the safe check.
as shown next:
https://github.com/github/codeql/blob/c1c0a705b9f14c0f577a9ae56a9d699e8b6e67d6/python/ql/lib/semmle/python/security/dataflow/PathInjectionQuery.qll#L20-L28
However, there is a third state that is a required one: Unicode normalized. If ever a Unicode normalization is performed with a compatibility algorithm (NFKC or NFKD), the query would miss some cases precisely those ones where the Unicode normalization is not performed before the path normalization and the safe check. I draw a little chart to depict my saying:
The previous chart shows that when you consider a potential Unicode compatibility normalization, it is a required step before path normalization and safe check. If ever placed between the first two steps or after the last one, that would yield a vulnerable case that got missed due to the fact that the Unicode normalization may reintroduce unexpected special characters such as .. and /.
Regards @Sim4n6