Validate tar-encoding
We should decode and re-encode all tarballs we serve. This would ensure that tweaks in tarball which are ignored by tar cannot be used to create a file that interpreted by the browser as an HTML file.
Since we serve from a different domain, I'm not particularly worried. But it would also ensure that decoding errors experienced on the client are consistent.
This is probably a non-trivial project, we have to:
Stage 1 A) Write a tool that decodes all existing tarballs and stores them in a new bucket. B) Serve tarballs from the new bucket. C) Ensure that new uploads are written to both the old and new bucket.
.Stage 2: i) Stop creating tarballs in the old bucket (which uses the original encoding). ii) Remove tool that decodes existing tarballs and stores them in the new bucket.
Stage 3 x) Remove the old bukcet (which uses the original encoding).
We should probably wait a few weeks between "stage 1" and "stage 2" to see if anyone reports bugs, as the hash of all these tarballs is changed. Maybe a canonization will cause issues decoding on some platform.
We should also explore if the hash of the tarballs is stored somewhere in PUB_CACHE, ie. if this change will have any impact.
Related issue: #4440.
Now that we have content hashes in pubspec.lock I don't think we can do this retroactively
But we should be able to do it for new uploads.
We decided to keep the integrity of the uploaded tar-ball.
We should probably consider if there are more consistency checks we could do on upload time...