label-sleuth icon indicating copy to clipboard operation
label-sleuth copied to clipboard

Improve downloading files mechanism

Open martinscooper opened this issue 2 years ago • 2 comments

Is your feature request related to a problem?

In Label Sleuth, users can download the current model:

image

For big models, like Bert or Roberta, downloading the model can take a lot of time. Under the hood, downloading the model comprises two steps. First, the backend prepares a zip file (/prepare_model endpoint) that contains the models but also some other files, like a python script that shows the users how to use the downloaded model. Second, the backend actually send the prepared zip file (download_model endpoint). On the current implementation, users can't continue using the application because of a backdrop with a loading indicator that is displayed while the two metioned steps are in progress. Moreover, the dialog for selecting the folder where the model will be stored only appears after the whole model has been downloaded. This is because of how fetching the body of a response works when using the fetch javascript library.

[UPDATE] The problem has been partially solved. We used to separece downloading mechanism, one for when the authentication is enabled and one for when it isn't. The browser's native's mechanism is now used if the authentication is not enabled. Thus, the downloading is managed by the browser and the user is able to use the system while the model finishes being downloaded and can see the model downloading progress. We are able to use the browser native's mechanism if authentication isn't enabled because there is no need to send a token. If authentication is enabled we can't relay on the the browser native's mechanism because headers can't be set and thus no authentication header can be specified (we use an anchor element to programatically trigger the browser to download the file and only an url can be specify -no headers-). Therefore, if authentication is enabled, the fetch library is used to download the file. It works, but the folder selection dialog is opened only when the whole zip file has been downloaded completely (and stored in-memory). This is not okey because: a) the folder selection dialog only appears when the zip file has been completely downloaded, b) for big models, users have to wait a lot with no status indications of the download progress, and c) if model is too big, the download can crash because there is not enough memory.

What is the expected behavior?

Redesign and reimplement the mechanism for downloading the model zip file so that:

  • the user can continue using the application while the downloading is in progress,
  • the user can see the downloading progress status,
  • there are no memory issues for downloading big models

Additional context (if any):

martinscooper avatar Apr 27 '23 15:04 martinscooper

Status update: This PR partially addresses the current issue. It implements a custom downloading indicator while using the fetch library, so no problem if authentication is enabled. Moreover, the user is able to use the system while the model is being downloaded and they are prompted with the folder selection dialog when the backend finishes preparing the zip file (which includes the model).

However, the mentioned PR has been closed because the implemented downloading mechanism (using the fetch) fails when the model is too big. The downloaded file is somehow fully stored in memory at some point and makes the download to crash in some browsers (Safari throws a this webpage was reloaded because a problem occurred file download error). I couldn't find a workaround to this issue. I have monitored the memory resources while the downloading is happening using a) this approach (using fetch and streaming the file) and b) using the browser's native mechanism (done by programmatically clicking an anchor element). On b), there is no memory issues. I didn't find any way of mimicking the browser native's mechanism. Therefore, help is needed.

See the attached video for a demo of the implemented custom downloading:

https://github.com/label-sleuth/label-sleuth/assets/22247838/d84816cb-46c8-4096-9e08-50288e420deb

martinscooper avatar May 15 '23 16:05 martinscooper

The method of the PR that downloads the model can be found here. In summary, once the response is received (i.e. the zip file is ready to start loading and we know the contentLenth), we loop through all the chunks of the ReadableStream while the downloading percentage is calculated and updated in the toast notification. Some code has been added to let handle the download cancel action too.

setModelIsDownloading(True) // show a notification that says the model zip file is being prepared
const {response} = await client.get(downloadURL)
const contentLength = response.headers.get("Content-Length");
let receivedLength = 0;
const reader = response.body.getReader();
let chunks = [];

while (true) {
    const { done, value } = await reader.read();

    if (done) {
        break;
     }
     chunks.push(value);
     receivedLength += value.length;
     calculatePercentageAndUpdateToast(contentLength, receivedLength)

setModelIsDownloading(False)
download(chunks) // opens folder selection dialog

What is preventing us for merging this PR is the memory issues. We couldn't assess whether there is a way of avoiding storing the full model zip file in memory. There should, as the browser's native's downloading mechanism doesn't have any memory issues.

martinscooper avatar May 15 '23 16:05 martinscooper