Files passing through Cloudflare can no longer be read over http
Some of the tests in the CI started failing. After looking into it, I found out that it is because something Cloudflare is doing is messing with the reported content length in the headers and messing things up.
Here's an example of a file that fails.
import uproot
uproot.open("http://scikit-hep.org/uproot3/examples/HZZ.root")
And here is how I see that there is a mismatch
import requests
r = requests.get("http://scikit-hep.org/uproot3/examples/HZZ.root")
print(f"reported length={r.headers['content-length']}, actual length={len(r.content)}")
# reported length=210860, actual length=217945
Using a direct link to that file works fine
import requests
r = requests.get("https://github.com/scikit-hep/uproot3/raw/refs/heads/gh-pages/examples/HZZ.root")
print(f"reported length={r.headers['content-length']}, actual length={len(r.content)}")
# reported length=217945, actual length=217945
This is not an bug in Uproot, but I'm just posting it here to document it and discuss it. I'll keep looking into it.
@chrisburr - Could you, please, have a look? Thanks!
Okay, so it seems like this is because Cloudflare is compressing the file before sending it over. The content-length reported is the compressed one, but Uproot (really, fsspec/aiohttp) is treating that as the uncompressed size (I think). I can disable compression, but then it doesn't return any content-length and Uproot still crashes.
This is kind of a horrible solution, but I ended up adding a redirect rule where if someone requests
http*://scikit-hep.org/uproot3/*.root
it is redirected to
https://github.com/scikit-hep/uproot3/raw/refs/heads/gh-pages/${2}.root
And that seems to have worked. We'll discuss in the meeting later today, but I think this solution is good enough for now.
Thanks, @ariostas ! I agree. Let's do this for now, and reevaluate later. Thanks!