git-slack-hook icon indicating copy to clipboard operation
git-slack-hook copied to clipboard

Escapes for various elements (urls, json, slack payload, directories, commit log contents)

Open init-js opened this issue 8 years ago • 1 comments

This pull request is concerned with escaping and encoding fixes. I've uncovered several problems, trying to set hooks for a cGit web viewer on a gitolite repository system.

  1. Parsing the logs (esp. multiple update changesets):
  • don't use sed for splitting fields. sed doesn't support non-greedy wildcards. this is problematic for sed forms such as 's/foo\(.*\)bar/\1/g'. any "bar" substring would get gobbed up in the group. Alternatives to make the grouping non-greedy, such as [^x]*, introduce ambiguities and negative lookaheads are difficult to maintain. This is an improved version of pull request #36 , #27 (which didn't fix the issue completely).
  • The patches here replace the shell pipeline with a bash tokenizer and a nonce as the field separator. The random separator makes false positives extremely unlikely, the code is easy to read, and the template straightforward to extend.
  • The fields title and value are first extracted, and then inserted into a json string. The previous shell pipeline conflated the two tasks. The resulting code is much easier to extend.
  1. Unsupported characters in URLs:
    • sed "s/%hash%/$url/g" produces unexpected results without escaping $url for special sed-interpreted character commands (like '&'). For these forms, the patch provides workarounds that avoid the need to escape URLs -- but produce the same output.
    • ampersand '&' characters are valid URL query string characters, eg. ?a=b&c=d . the Cgit git viewer uses these forms to display commits ( ?id=abcabc&id2=defdef ). These query strings wreak havoc on the sed substitutions. Users shouldn't have to escape URLs in their config. Fixes issue #24.
  2. Directory path problems:
    • git repo directory names with spaces produce invalid notification messages. the patch provides adequate shell quoting during repo name determination: instances of $(basename $foo) are changed to $(basename "$foo").
  3. JSON encoding problems:
    • Only a subset of special characters were escaped (linefeeds 0x0a, '', and '"') in json strings. Unicode characters should be escaped too. The patch uses a python one-liner to properly escape json (with the json package). It assumes the commit messages from the log are utf-8. Invalid sequences are replaced with unicode replacement char json sequences \ufffd.
    • A comment seemed to indicate that slack did not support newlines in the payload. This is erroneous. The json may contain literal linefeeds (e.g. between array members), but they must be escaped inside json string values.
    • the channel name, emoji, and other string fields of the payload are now also json-encoded.
  4. POST body encoding fixes.
    • The content passed to curl -d is assumed to be already encoded. the patch uses --data-urlencode instead to ask curl to perform url encoding. This bit of the patch resolves issue #38 and #28.
  5. Things not covered.
    • if the commit messages contain <http://example.com|b> sequences, those aren't escaped for slack.

init-js avatar Oct 05 '16 23:10 init-js

Any chance this could be merged? I came across the '&' in changeset URL issue which I fixed very crudely (by escaping it in the sed expression). It looks like this fixes a lot of issues!

piit79 avatar Feb 04 '17 12:02 piit79