brainglobe-atlasapi
brainglobe-atlasapi copied to clipboard
bg-atlasapi.utils.retrieve_over_http` - percentage not updating
Not exactly the highest priority, but when I used bg-atlasapi.utils.retrieve_over_http
, it works fine, but the percentage doesn't update
I see this for some atlases as well (eg allen_mouse_100um
and the rest of the allen_mouse
atlases) and not for others (eg kim_mouse_100um
). The "content-length" appears to return 0, so I presume the percentage doesn't get calculated for those files. Could it be related to configuration of some of the hosted files?
I've noticed that, but I don't understand why. The atlases are all hosted (as far as I know) in identical ways. They're in this GIN repo.
Also having this problem when using Allenmouse atlases. With slow downloads and remote Jupyter kernels secretly halting, this loading bar would be a big help.
Might be worth replacing this utility function with https://parfive.readthedocs.io/en/stable/, which I've used before as a downloader with progress bar and can recommend
It looks like a subset of the atlases are using chunked transfer encoding which doesn't return a content-length in the header. I wonder if it has something to do with frequency of use? Atlases that are downloaded more frequently might be cached separately from the others.
Could the GIN repo host a file holding the sizes of all atlases to avoid this issue? Otherwise it looks like the https://gin.g-node.org/BrainGlobe/atlases/src/master/example_mouse_100um_v1.2.tar.gz style pages always have the file size in MB. We could parse the HTML returned from those URLs if the download get call doesn't return a content-length header.
Could the GIN repo host a file holding the sizes of all atlases to avoid this issue?
Don't see why not.
Possibly dumb question though - GIN "knows" how big the files are, it's shown in the UI for all atlases (e.g. Waxholm). Can we use this somehow?
It does, I have something that's hacky but working. It gets the HTML for that page, looks for " MB" and parses the number immediately before.
response_for_size = response.get(url_for_ui_page)
marker = b" MB"
end = response_for_size.content.find(marker)
start = response_for_size.content[:end].rfind(b">") + 1
tot_size = int(float(response_for_size.content[start:end]) * 1000000)
I'm amazed that this is the best way we have, but I guess if it works?
Ultimately I think we should store the sizes somewhere accessible to avoid relying on the formatting of the GIN pages