LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

feat(dowloader): resume partial downloads

Open Saavrm26 opened this issue 11 months ago • 6 comments

Description

This PR adds support for resuming partial downloads.

  • [x] Resuming partial downloads
  • [x] Verifying checksum
  • [x] Reflecting download percentage on front end after resuming download

Notes for Reviewers A few questions:

  1. Should we check if the server supports Range Header(used here for partial downloads), most modern web servers do.
  2. I wanted to have a discussion when a model in gallery has multiple files.

Signed commits

  • [x] Yes, I signed my commits.

Saavrm26 avatar Jan 04 '25 21:01 Saavrm26

Deploy Preview for localai ready!

Name Link
Latest commit a17c9e3718e8546b7aa982f6773d48423e2e12bf
Latest deploy log https://app.netlify.com/sites/localai/deploys/677ebd96baafa10008f8477d
Deploy Preview https://deploy-preview-4537--localai.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Jan 04 '25 21:01 netlify[bot]

Hi @mudler, can you have a look at this PR and the notes for reviewers section. Thanks

Saavrm26 avatar Jan 06 '25 18:01 Saavrm26

Hi @mudler, can you have a look at this PR and the notes for reviewers section. Thanks

Sure! will try to review today, thanks for the PR.

mudler avatar Jan 07 '25 13:01 mudler

1. Should we check if the server supports Range Header(used here for partial downloads), most modern web servers do.

I think we probably should - mainly because otherwise if a web server does not support it we leave a dangling .partial file which can't be removed, and the model can't be installed anymore

2. I wanted to have a discussion when a model in gallery has multiple files.

In this case it should use the same underlying download function, so partial files would be recovered as well by this changeset, or am I missing anything?

the changeset looks good to me. My only open point is about detecting if a webserver doesn't support streaming files from a certain point, otherwise we should find another way to remove the .partials when we can't resume for some reason and start a fresh download instead (e.g. the partial data is invalid, or the webserver does not support it, or there are other relevant errors to the download)

mudler avatar Jan 07 '25 19:01 mudler

I think we probably should - mainly because otherwise if a web server does not support it we leave a dangling .partial file which can't be removed, and the model can't be installed anymore

the changeset looks good to me. My only open point is about detecting if a webserver doesn't support streaming files from a certain point, otherwise we should find another way to remove the .partials when we can't resume for some reason and start a fresh download instead (e.g. the partial data is invalid, or the webserver does not support it, or there are other relevant errors to the download)

Yes, there will be the issue with dangling .partial file, I was unsure weather we should delete the partial file automatically or let users delete them manually. The way I am thinking of implementing this is sending a HEAD request and checking if there is presence of Accept-Ranges header in the response.

In this case it should use the same underlying download function, so partial files would be recovered as well by this changeset, or am I missing anything?

There is a scenario that might pose a problem, though I do not know how common it would be.
Suppose we have a model in gallery that has 2 files(let's say, SomeSupportingBinary, ModelA) . And one of them(SomeSupportingBinary) got downloaded successfully, while the other one was a partial download(ModelA.partial). Now by the time we had resumed the download, the ModelA in the web server got updated. On a user's system when they resume the download, after completion, the downloaded ModelA will not match the corresponding SHA in _gallery_MODELA.yml and hence it will be deleted.

But here lies the issue that the SomeSupportingBinary will still be in the user's file system. This is one of the scenarios, there are a few others as well

Saavrm26 avatar Jan 07 '25 20:01 Saavrm26

I think we probably should - mainly because otherwise if a web server does not support it we leave a dangling .partial file which can't be removed, and the model can't be installed anymore

the changeset looks good to me. My only open point is about detecting if a webserver doesn't support streaming files from a certain point, otherwise we should find another way to remove the .partials when we can't resume for some reason and start a fresh download instead (e.g. the partial data is invalid, or the webserver does not support it, or there are other relevant errors to the download)

Yes, there will be the issue with dangling .partial file, I was unsure weather we should delete the partial file automatically or let users delete them manually. The way I am thinking of implementing this is sending a HEAD request and checking if there is presence of Accept-Ranges header in the response.

Cool, that actually looks a sane approach

In this case it should use the same underlying download function, so partial files would be recovered as well by this changeset, or am I missing anything?

There is a scenario that might pose a problem, though I do not know how common it would be. Suppose we have a model in gallery that has 2 files(let's say, SomeSupportingBinary, ModelA) . And one of them(SomeSupportingBinary) got downloaded successfully, while the other one was a partial download(ModelA.partial). Now by the time we had resumed the download, the ModelA in the web server got updated. On a user's system when they resume the download, after completion, the downloaded ModelA will not match the corresponding SHA in _gallery_MODELA.yml and hence it will be deleted.

But here lies the issue that the SomeSupportingBinary will still be in the user's file system. This is one of the scenarios, there are a few others as well

Gotcha, however it would be similar to the case if there is another failure (e.g. the server returns a 500) and not strictly related to the changes introduced in this PR.

I don't think should be part of this PR as well but rather be a follow-up to fix these behaviors - in the worse case there would be a file dangling (with the correct checksums), which would pass checks if the user would try to re-install the model (and deleting it), but that was the case before your changes.

mudler avatar Jan 07 '25 21:01 mudler

Hi, thanks for your feedback and sorry for the delayed response.

I don't think should be part of this PR as well but rather be a follow-up to fix these behaviors - in the worse case there would be a file dangling (with the correct checksums), which would pass checks if the user would try to re-install the model (and deleting it), but that was the case before your changes.

Understood!

I'll work on implementing the server check.

Thanks!

Saavrm26 avatar Jan 08 '25 03:01 Saavrm26

Hi I've implemented the server check and a test for the same.

Thanks!

Saavrm26 avatar Jan 08 '25 18:01 Saavrm26

Thanks for the review, looking to contribute in other areas of the repository!

Saavrm26 avatar Jan 09 '25 08:01 Saavrm26