wave icon indicating copy to clipboard operation
wave copied to clipboard

`q.site.upload` is slow

Open bminixhofer opened this issue 2 years ago • 9 comments

Wave SDK Version, OS

0.17.0, Linux

Actual behavior

We are dealing with some large files (predictions, model weights etc.) with a couple hundred MB, which we want to make available for download.

I'm running Wave locally, and with the following code:

from h2o_wave import Q, main, app, ui
import time

@app("/demo")
async def serve(q: Q):
    start = time.time()

    (url,) = await q.site.upload(["data.dump"])

    upload_time = time.time() - start
    print(f"upload time: {upload_time}")

    q.page['meta'] = ui.meta_card(box='')
    q.page["meta"].redirect = f"http://localhost:10101/{url}"

    await q.page.save()

I get

$ head -c 100MB < /dev/urandom > data.dump
$ wave run app
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [28044] using statreload
INFO:     Started server process [28046]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     127.0.0.1:37576 - "POST / HTTP/1.1" 200 OK
upload time: 33.335158348083496

i.e. it takes ~ 33 seconds for a 100MB file to start downloading.

Expected behavior

Since the file only has to be copied to a location on the same machine, I'd expect the download to start almost immediately.

Steps To Reproduce

  1. Start waved locally.

  2. head -c 100MB < /dev/urandom > data.dump

  3. Write app.py:

from h2o_wave import Q, main, app, ui
import time

@app("/demo")
async def serve(q: Q):
    start = time.time()

    (url,) = await q.site.upload(["data.dump"])

    upload_time = time.time() - start
    print(f"upload time: {upload_time}")

    q.page['meta'] = ui.meta_card(box='')
    q.page["meta"].redirect = f"http://localhost:10101/{url}"

    await q.page.save()
  1. wave run app

bminixhofer avatar Sep 06 '21 08:09 bminixhofer

@srini-x I can't seem to assign you, please just assign yourself.

bminixhofer avatar Sep 09 '21 09:09 bminixhofer

Looks like the httpx async client is slower than the regular client:

package main

import (
	"io"
	"log"
	"net/http"
	"os"
)

func upload(w http.ResponseWriter, r *http.Request) {
	if err := r.ParseMultipartForm(32 << 20); err != nil { // 32 MB
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}
	form := r.MultipartForm
	files, ok := form.File["files"]
	if !ok {
		http.Error(w, "no files", http.StatusBadRequest)
		return
	}
	for _, file := range files {
		src, err := file.Open()
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		defer src.Close()
		dst, err := os.OpenFile(file.Filename, os.O_WRONLY|os.O_CREATE, 0666)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		defer dst.Close()
		io.Copy(dst, src)
	}
}
func main() {
	http.HandleFunc("/", upload)
	log.Fatal(http.ListenAndServe(":8080", nil))

import httpx
import time
import os

files = ['data.dump']
print('uploading...')
start = time.time()
res = httpx.post('http://localhost:8080/', files=[('files', (os.path.basename(f), open(f, 'rb'))) for f in files])
print(f'upload time: {time.time() - start}, status: {res.status_code} {res.text}')
(venv) elp@studio py % python upload.py
uploading...
upload time: 8.068389892578125, status: 200 
import asyncio
import httpx
import time
import os

files = ['data.dump']


async def main():
    async with httpx.AsyncClient() as client:
        print('uploading...')
        start = time.time()
        res = await client.post('http://localhost:8080/',
                                files=[('files', (os.path.basename(f), open(f, 'rb'))) for f in files])
        print(f'upload time: {time.time() - start}, status: {res.status_code} {res.text}')


asyncio.run(main())
(venv) elp@studio py % python upload.py
uploading...
upload time: 39.59935522079468, status: 200 

lo5 avatar Sep 10 '21 17:09 lo5

Maybe related: https://github.com/encode/httpx/issues/838

lo5 avatar Sep 10 '21 17:09 lo5

Maybe related: encode/httpx#838

Looks like they are not eager to improve speed soon: https://github.com/encode/httpx/issues/838#issuecomment-598919546

psinger avatar Sep 10 '21 17:09 psinger

FYI: workaround / alternate feature that might help circumventing this issue: https://github.com/h2oai/wave/blob/master/website/docs/files.md#serving-files-directly-from-the-wave-server

Will be out in the next release.

lo5 avatar Sep 10 '21 20:09 lo5

@lo5 looks good, will this also work in h2o-cloud?

psinger avatar Sep 10 '21 21:09 psinger

@psinger Filed https://github.com/h2oai/h2o-ai-cloud/issues/1920

lo5 avatar Sep 10 '21 23:09 lo5

Closed: https://github.com/encode/httpx/pull/1948 https://github.com/encode/httpx/issues/838

lo5 avatar Feb 27 '22 21:02 lo5

Looks like they are not eager to improve speed soon:

We could move the files within the FS in if wave app and wave server are located on the same machine (which is very common anyway) and dodge pushing it through HTTP. Wdyt @lo5?

mturoci avatar Aug 12 '22 14:08 mturoci