brainglobe-atlasapi icon indicating copy to clipboard operation
brainglobe-atlasapi copied to clipboard

bg-atlasapi.utils.retrieve_over_http` - percentage not updating

Open adamltyson opened this issue 2 years ago • 2 comments

Not exactly the highest priority, but when I used bg-atlasapi.utils.retrieve_over_http, it works fine, but the percentage doesn't update

adamltyson avatar May 30 '22 17:05 adamltyson

I see this for some atlases as well (eg allen_mouse_100um and the rest of the allen_mouse atlases) and not for others (eg kim_mouse_100um). The "content-length" appears to return 0, so I presume the percentage doesn't get calculated for those files. Could it be related to configuration of some of the hosted files?

yoda-vid avatar Jul 27 '22 09:07 yoda-vid

I've noticed that, but I don't understand why. The atlases are all hosted (as far as I know) in identical ways. They're in this GIN repo.

adamltyson avatar Jul 27 '22 09:07 adamltyson

Also having this problem when using Allenmouse atlases. With slow downloads and remote Jupyter kernels secretly halting, this loading bar would be a big help.

rcpeene avatar Nov 02 '22 21:11 rcpeene

Might be worth replacing this utility function with https://parfive.readthedocs.io/en/stable/, which I've used before as a downloader with progress bar and can recommend

dstansby avatar Apr 24 '23 16:04 dstansby

It looks like a subset of the atlases are using chunked transfer encoding which doesn't return a content-length in the header. I wonder if it has something to do with frequency of use? Atlases that are downloaded more frequently might be cached separately from the others.

Could the GIN repo host a file holding the sizes of all atlases to avoid this issue? Otherwise it looks like the https://gin.g-node.org/BrainGlobe/atlases/src/master/example_mouse_100um_v1.2.tar.gz style pages always have the file size in MB. We could parse the HTML returned from those URLs if the download get call doesn't return a content-length header.

IgorTatarnikov avatar Oct 19 '23 13:10 IgorTatarnikov

Could the GIN repo host a file holding the sizes of all atlases to avoid this issue?

Don't see why not.

Possibly dumb question though - GIN "knows" how big the files are, it's shown in the UI for all atlases (e.g. Waxholm). Can we use this somehow?

adamltyson avatar Oct 19 '23 13:10 adamltyson

It does, I have something that's hacky but working. It gets the HTML for that page, looks for " MB" and parses the number immediately before.

    response_for_size = response.get(url_for_ui_page)

    marker = b" MB"
    end = response_for_size.content.find(marker)
    start = response_for_size.content[:end].rfind(b">") + 1
    tot_size = int(float(response_for_size.content[start:end]) * 1000000)

IgorTatarnikov avatar Oct 19 '23 13:10 IgorTatarnikov

I'm amazed that this is the best way we have, but I guess if it works?

adamltyson avatar Oct 19 '23 13:10 adamltyson

Ultimately I think we should store the sizes somewhere accessible to avoid relying on the formatting of the GIN pages

IgorTatarnikov avatar Oct 19 '23 16:10 IgorTatarnikov