bork
bork copied to clipboard
download fails when Content-Length is missing from headers
The following assertion fails:
ok download ~/.ssh/authorized_keys "https://github.com/bcomnes.keys" --size
because the http headers from this github request fails to return a Content-length
bret-mbr:.ssh bret$ bork status ~/.dotfiles/install/ssh.sh
conflict (upgradable): download authorized_keys https://github.com/bcomnes.keys
expected size: bytes
received size: bytes
bret-mbr:.ssh bret$ curl -sI https://github.com/bcomnes.keys
HTTP/1.1 200 OK
Server: GitHub.com
Date: Mon, 09 May 2016 00:13:04 GMT
Content-Type: text/plain; charset=utf-8
Status: 200 OK
Cache-Control: no-cache
Vary: X-PJAX
X-UA-Compatible: IE=Edge,chrome=1
Set-Cookie: logged_in=no; domain=.github.com; path=/; expires=Fri, 09 May 2036 00:13:04 -0000; secure; HttpOnly
X-Request-Id: d529280b38e3de27e17235e81f5cf091
X-Runtime: 0.008604
Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; child-src render.githubusercontent.com; connect-src 'self' uploads.github.com status.github.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com api.braintreegateway.com client-analytics.braintreegateway.com wss://live.github.com; font-src assets-cdn.github.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: assets-cdn.github.com identicons.github.com www.google-analytics.com collector.githubapp.com *.gravatar.com *.wp.com checkout.paypal.com *.githubusercontent.com; media-src 'none'; object-src assets-cdn.github.com; plugin-types application/x-shockwave-flash; script-src assets-cdn.github.com; style-src 'unsafe-inline' assets-cdn.github.com
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
Public-Key-Pins: max-age=5184000; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="RRM1dGqnDFsCJXBTHky16vi1obOlCgFFn/yOhI/y+ho="; pin-sha256="k2v657xBsOVe1PQRwOsHsw3bsGT2VzIqz5K+59sNQws="; pin-sha256="K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q="; pin-sha256="IQBnNBEiFuhj+8x6X8XLgh01V9Ic5/V3IRQLNFFc7v4="; pin-sha256="iie1VXtL7HzAMF+/PVPR9xzT80kQxdZeJ+zduCB3uj0="; pin-sha256="LvRiGEjRqfzurezaWuj8Wie2gyHMrW5Q06LspMnox7A="; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
Vary: Accept-Encoding
X-Served-By: 1c0ce1a213af16e49d5419559ef44f50
X-GitHub-Request-Id: 499DD518:7DA5:9D56CB7:572FD60F
Not really sure what the right way to handle this. Obviously, this strategy shouldn't be used in this kind of scenario. Maybe some kind of alternative hashing comparison could be used instead? I found this when dealing with the https://github.com/mattly/bork/issues/72 bug.
Anther great catch. Thanks for reporting the issues, @bcomnes 👍
I feel like hashing won't really work unless the file is actually downloaded completly (which we don't really want in this case). Hashing just the filename on the other hand is not enough for a in-depth comparision.
I suggest we skip the "comparision" feature in this case and just re-download the file when there's no Content-Length set. What do you think?
/cc @mattly
The usefulness of the hashing comes from knowing if the remote file is different than what is present on the system... when that kind of information is important. For example, I really want to see if ~/.ssh/authorized_keys
gets updated or not for various reasons.
Yes there is a cost of downloading and hashing, comparing, downloading again to replace, then downloading and hashing again to double check, but for the case of small files, this cost is an ignorable cost to get at the 'did the file change or not' data point.
It also shares a similar timing attack/bug vulnerability as the size check if the file happens to change on the server at that time, or incorrect headers are being sent across all 3 requests. One improvement would be to cache the request results once, perform the desired checks, attempt and recheck with a single request. For small files, you can assign the contents to a variable and this is pretty achievable.
Here is an almost working prototype:
if [ -n "$hash" ]; then
sourcesha=$(bake shasum -a 256 "\"$targetfile\"")
remotesha=$(curl -sL $sourceurl | shasum -a 256 )
if [ "$sourcesha" != "$remotesha" ]; then
echo "expected sha256: $remotesha"
echo "received sha256: $sourcesha"
return $STATUS_CONFLICT_UPGRADE
fi
fi
Third, its not clear to me if the server 404, or 500's, it looks like download
upgrade will still offer to upgrade vs outright fail since there are no status code checks? This is a bigger issue for compiled scripts that perform the upgrade automatically.
I don't remember what use case my initial stab at download
was intending to solve, but I can 100% guarantee you it had a Content-Length
header :p.
I think you both have good thoughts on the correct behaviors, and hash is definitely a better thing to check against than size. I'm going to deprecate the size option and remove it for 1.0, and add in a hash option. Thoughts on letting people specify the algorithm?
Unless trivial to add, I would leave it to those who want to customize the algorithm to explore/implement that. I'm totally fine with a sane default.