codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Add query for tainted `wordexp` calls.

Open intrigus-lgtm opened this issue 3 years ago • 14 comments
trafficstars

This is my first query for C/C++ I think, so there is likely some improvement possible.

intrigus-lgtm avatar Aug 16 '22 22:08 intrigus-lgtm

This is inspired by https://github.com/syoyo/tinygltf/issues/368. Sadly, my query can not find that specific instance. The taint seems to get lost in LoadFromString around lines 5648. LoadFromString -> ParseBuffer -> LoadExternalFile -> FindFile -> fs->ExpandFilePath (ExpandFilePathFunction) -> ExpandFilePath (line 2611) -> wordexp (line 2640)

~~(The used database is from https://lgtm.com/projects/g/syoyo/tinygltf/ci/#ql)~~ EDIT: The used database can be found here instead: https://github.com/intrigus-lgtm/file-dump/blob/master/syoyo_tinygltf_cpp-srcVersion_81f7dbe53a112d05217a79bb1c986f9ada3b6631-dist_codeql-bundle-linux64-20220811-2843018186.zip?raw=true

intrigus-lgtm avatar Aug 16 '22 22:08 intrigus-lgtm

Hi @intrigus-lgtm. syoyo/tinygltf contains a call to wordexp in tiny_gltf.h line 2661. However the LGTM build does not include this line (presumably because of the #if preprocessor condition a few lines up combined with the specific compliation flags / environment that was used in the build) - so the call is not in the database and your query can't be expected to find it as a result on this particular snapshot. I wouldn't worry about it, but if you're really determined you could attempt to build a snaphot of the project with different settings, or write test cases that are closer to what you see there (e.g. exercising taint flow through a std::string).

I'm also suspicious you're right about taint not reaching the function, but I'm not sure where it stops or why (or whether its correct to). Again, the query is not at fault.

geoffw0 avatar Aug 17 '22 13:08 geoffw0

Hi @intrigus-lgtm. syoyo/tinygltf contains a call to wordexp in tiny_gltf.h line 2661. However the LGTM build does not include this line (presumably because of the #if preprocessor condition a few lines up combined with the specific compliation flags / environment that was used in the build) - so the call is not in the database and your query can't be expected to find it as a result on this particular snapshot. I wouldn't worry about it, but if you're really determined you could attempt to build a snaphot of the project with different settings, or write test cases that are closer to what you see there (e.g. exercising taint flow through a std::string).

I'm also suspicious you're right about taint not reaching the function, but I'm not sure where it stops or why (or whether its correct to). Again, the query is not at fault.

I guess the problem is that lgtm.com received the new commits that patched the problem (by commenting out the vulnerable code), so the db does not contain any calls to wordexp anymore. You can try this slightly older database instead that works for me: https://github.com/intrigus-lgtm/file-dump/blob/master/syoyo_tinygltf_cpp-srcVersion_81f7dbe53a112d05217a79bb1c986f9ada3b6631-dist_codeql-bundle-linux64-20220811-2843018186.zip?raw=true

I could successfully find the call to wordexp in that database yesterday, but taint is getting lost as described in https://github.com/github/codeql/pull/10077#issuecomment-1217220464

intrigus-lgtm avatar Aug 17 '22 13:08 intrigus-lgtm

Thanks for that update ... I think the taint issue might be that the flow into ParseBuffer is in a lambda expression, its likely such flows are less well tested that other more common situations. For some reason (perhaps to do with capturing the base_dir variable, or uncertainty that the lambda is called???) taint flow isn't working there. I'll file an internal issue so we get this fixed at some point.

geoffw0 avatar Aug 17 '22 15:08 geoffw0

I've started the checks, and run a moderately large LGTM run of the query here: https://lgtm.com/query/1704064894746891004/ (results look good at a glance)

geoffw0 avatar Aug 22 '22 20:08 geoffw0

Just to be sure, this does not need a review of the security lab?

intrigus-lgtm avatar Aug 23 '22 10:08 intrigus-lgtm

Just to be sure, this does not need a review of the security lab?

Yes, it will. It will be merged once Security Lab has reviewed it and are happy with it :)

MathiasVP avatar Aug 23 '22 10:08 MathiasVP

One FP pattern I have noticed is when the taint from char* is transferred to int through strlen and causes false positives. I think for this query a string to integer sanitizer is need.

I may add more comments in the thread as I review the results.

JarLob avatar Aug 24 '22 11:08 JarLob

I think for this query a string to integer sanitizer is need.

Its fairly common to simply block flow through all integer typed expressions, in order to stop this sort of thing. Something like:

override predicate isSanitizer(DataFlow::Node node) {
  node.asExpr().getUnspecifiedType() instanceof IntegralType
}

geoffw0 avatar Aug 24 '22 13:08 geoffw0

~~The query doesn't find the vulnerability in tinygltf built from changset 0fa56e239c77cc864dc248842e8887d985cf8e3f. Please fix it.~~

JarLob avatar Aug 24 '22 14:08 JarLob

I see why it doesn't find. Please, disregard.

JarLob avatar Aug 24 '22 14:08 JarLob

I switched the sink to any() to see all reachable nodes (I think it was Pavel who has shown the technique in one of CodeQL videos) and it looks to me the taint stops at bool fileread = fs.ReadWholeFile(&data, &fileerr, filename, fs.user_data); which is tinygltf specific function. I think CodeQL doesn't understand the taint flows to the first std::vector<unsigned char> *out parameter. I think it is ok to not include tinygltf specific steps in the general query.

JarLob avatar Aug 24 '22 15:08 JarLob

@intrigus-lgtm Could you please add the sanitizer?

JarLob avatar Sep 12 '22 17:09 JarLob

@JarLob done, sorry for the delay.

intrigus-lgtm avatar Sep 12 '22 19:09 intrigus-lgtm

Oh, and I'll run the checks now...

geoffw0 avatar Oct 07 '22 15:10 geoffw0

@geoffw0 anything else missing?

intrigus-lgtm avatar Oct 17 '22 09:10 intrigus-lgtm