uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

Files passing through Cloudflare can no longer be read over http

Open ariostas opened this issue 6 months ago • 4 comments

Some of the tests in the CI started failing. After looking into it, I found out that it is because something Cloudflare is doing is messing with the reported content length in the headers and messing things up.

Here's an example of a file that fails.

import uproot
uproot.open("http://scikit-hep.org/uproot3/examples/HZZ.root")

And here is how I see that there is a mismatch

import requests
r = requests.get("http://scikit-hep.org/uproot3/examples/HZZ.root")
print(f"reported length={r.headers['content-length']}, actual length={len(r.content)}")
# reported length=210860, actual length=217945

Using a direct link to that file works fine

import requests
r = requests.get("https://github.com/scikit-hep/uproot3/raw/refs/heads/gh-pages/examples/HZZ.root")
print(f"reported length={r.headers['content-length']}, actual length={len(r.content)}")
# reported length=217945, actual length=217945

This is not an bug in Uproot, but I'm just posting it here to document it and discuss it. I'll keep looking into it.

ariostas avatar Jun 19 '25 08:06 ariostas

@chrisburr - Could you, please, have a look? Thanks!

ianna avatar Jun 19 '25 11:06 ianna

Okay, so it seems like this is because Cloudflare is compressing the file before sending it over. The content-length reported is the compressed one, but Uproot (really, fsspec/aiohttp) is treating that as the uncompressed size (I think). I can disable compression, but then it doesn't return any content-length and Uproot still crashes.

ariostas avatar Jun 19 '25 12:06 ariostas

This is kind of a horrible solution, but I ended up adding a redirect rule where if someone requests

http*://scikit-hep.org/uproot3/*.root

it is redirected to

https://github.com/scikit-hep/uproot3/raw/refs/heads/gh-pages/${2}.root

And that seems to have worked. We'll discuss in the meeting later today, but I think this solution is good enough for now.

ariostas avatar Jun 19 '25 12:06 ariostas

Thanks, @ariostas ! I agree. Let's do this for now, and reevaluate later. Thanks!

ianna avatar Jun 19 '25 13:06 ianna