codeql
codeql copied to clipboard
Add query for tainted `wordexp` calls.
This is my first query for C/C++ I think, so there is likely some improvement possible.
This is inspired by https://github.com/syoyo/tinygltf/issues/368.
Sadly, my query can not find that specific instance.
The taint seems to get lost in LoadFromString around lines 5648.
LoadFromString -> ParseBuffer -> LoadExternalFile -> FindFile -> fs->ExpandFilePath (ExpandFilePathFunction) -> ExpandFilePath (line 2611) -> wordexp (line 2640)
~~(The used database is from https://lgtm.com/projects/g/syoyo/tinygltf/ci/#ql)~~ EDIT: The used database can be found here instead: https://github.com/intrigus-lgtm/file-dump/blob/master/syoyo_tinygltf_cpp-srcVersion_81f7dbe53a112d05217a79bb1c986f9ada3b6631-dist_codeql-bundle-linux64-20220811-2843018186.zip?raw=true
Hi @intrigus-lgtm. syoyo/tinygltf contains a call to wordexp in tiny_gltf.h line 2661. However the LGTM build does not include this line (presumably because of the #if preprocessor condition a few lines up combined with the specific compliation flags / environment that was used in the build) - so the call is not in the database and your query can't be expected to find it as a result on this particular snapshot. I wouldn't worry about it, but if you're really determined you could attempt to build a snaphot of the project with different settings, or write test cases that are closer to what you see there (e.g. exercising taint flow through a std::string).
I'm also suspicious you're right about taint not reaching the function, but I'm not sure where it stops or why (or whether its correct to). Again, the query is not at fault.
Hi @intrigus-lgtm.
syoyo/tinygltfcontains a call towordexpintiny_gltf.hline 2661. However the LGTM build does not include this line (presumably because of the#ifpreprocessor condition a few lines up combined with the specific compliation flags / environment that was used in the build) - so the call is not in the database and your query can't be expected to find it as a result on this particular snapshot. I wouldn't worry about it, but if you're really determined you could attempt to build a snaphot of the project with different settings, or write test cases that are closer to what you see there (e.g. exercising taint flow through astd::string).I'm also suspicious you're right about taint not reaching the function, but I'm not sure where it stops or why (or whether its correct to). Again, the query is not at fault.
I guess the problem is that lgtm.com received the new commits that patched the problem (by commenting out the vulnerable code), so the db does not contain any calls to wordexp anymore.
You can try this slightly older database instead that works for me:
https://github.com/intrigus-lgtm/file-dump/blob/master/syoyo_tinygltf_cpp-srcVersion_81f7dbe53a112d05217a79bb1c986f9ada3b6631-dist_codeql-bundle-linux64-20220811-2843018186.zip?raw=true
I could successfully find the call to wordexp in that database yesterday, but taint is getting lost as described in https://github.com/github/codeql/pull/10077#issuecomment-1217220464
Thanks for that update ... I think the taint issue might be that the flow into ParseBuffer is in a lambda expression, its likely such flows are less well tested that other more common situations. For some reason (perhaps to do with capturing the base_dir variable, or uncertainty that the lambda is called???) taint flow isn't working there. I'll file an internal issue so we get this fixed at some point.
I've started the checks, and run a moderately large LGTM run of the query here: https://lgtm.com/query/1704064894746891004/ (results look good at a glance)
Just to be sure, this does not need a review of the security lab?
Just to be sure, this does not need a review of the security lab?
Yes, it will. It will be merged once Security Lab has reviewed it and are happy with it :)
One FP pattern I have noticed is when the taint from char* is transferred to int through strlen and causes false positives. I think for this query a string to integer sanitizer is need.
I may add more comments in the thread as I review the results.
I think for this query a string to integer sanitizer is need.
Its fairly common to simply block flow through all integer typed expressions, in order to stop this sort of thing. Something like:
override predicate isSanitizer(DataFlow::Node node) {
node.asExpr().getUnspecifiedType() instanceof IntegralType
}
~~The query doesn't find the vulnerability in tinygltf built from changset 0fa56e239c77cc864dc248842e8887d985cf8e3f. Please fix it.~~
I see why it doesn't find. Please, disregard.
I switched the sink to any() to see all reachable nodes (I think it was Pavel who has shown the technique in one of CodeQL videos) and it looks to me the taint stops at bool fileread = fs.ReadWholeFile(&data, &fileerr, filename, fs.user_data); which is tinygltf specific function. I think CodeQL doesn't understand the taint flows to the first std::vector<unsigned char> *out parameter. I think it is ok to not include tinygltf specific steps in the general query.
@intrigus-lgtm Could you please add the sanitizer?
@JarLob done, sorry for the delay.
Oh, and I'll run the checks now...
@geoffw0 anything else missing?