codeql-cli-binaries
codeql-cli-binaries copied to clipboard
Add `string` predicates `startsWith`, `endsWith` and `contains`
What do you think about adding predicates to the CodeQL type string for determining whether a string has a given prefix or suffix or contains a substring? For example startsWith(string), endsWith(string) and contains(string).
Currently the workarounds are using indexOf(...) = 0 or matches(...%) (which seems to be faster than indexOf, see https://github.com/github/codeql/issues/6479#issuecomment-900081363). However, these predicates do not convey the intention as clearly, might not be that performant and for matches one must take care not to accidentally use %or _ where the intention was to match them literally.
In the github/codeql repository (at https://github.com/github/codeql/commit/39533317ffbeb6224d049de22fc182d8eeea4b61) there are at least:
- 196 cases where
startsWithcould be used (I searched for the regexmatches\("[^%_]*%"\)in CodeQL source files) - 72 cases where
endsWithcould be used (I searched for the regexmatches\("%[^%_]*"\)in CodeQL source files)
Thanks for the suggestion!
To give you some insight about what happens under the hood in the CodeQL engine, matches("...%") gets translated into an operation like startsWith("..."), and similarly matches("%...") becomes endsWith("..."). So the engine is already doing that optimisation for you. Using matches in this way is indeed preferable over indexOf, because indexOf has to produce all (character, index) pairs.
This isn't high on our list of priorities, but I agree that either exposing startsWith/endsWith primitives, or updating the docs for matches to clearly describe this case, is a good way to convey the intention.
A contains(string) predicate would be useful as well, have added it to the description.