wave
wave copied to clipboard
`q.site.upload` is slow
Wave SDK Version, OS
0.17.0, Linux
Actual behavior
We are dealing with some large files (predictions, model weights etc.) with a couple hundred MB, which we want to make available for download.
I'm running Wave locally, and with the following code:
from h2o_wave import Q, main, app, ui
import time
@app("/demo")
async def serve(q: Q):
start = time.time()
(url,) = await q.site.upload(["data.dump"])
upload_time = time.time() - start
print(f"upload time: {upload_time}")
q.page['meta'] = ui.meta_card(box='')
q.page["meta"].redirect = f"http://localhost:10101/{url}"
await q.page.save()
I get
$ head -c 100MB < /dev/urandom > data.dump
$ wave run app
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [28044] using statreload
INFO: Started server process [28046]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 127.0.0.1:37576 - "POST / HTTP/1.1" 200 OK
upload time: 33.335158348083496
i.e. it takes ~ 33 seconds for a 100MB file to start downloading.
Expected behavior
Since the file only has to be copied to a location on the same machine, I'd expect the download to start almost immediately.
Steps To Reproduce
-
Start
waved
locally. -
head -c 100MB < /dev/urandom > data.dump
-
Write
app.py
:
from h2o_wave import Q, main, app, ui
import time
@app("/demo")
async def serve(q: Q):
start = time.time()
(url,) = await q.site.upload(["data.dump"])
upload_time = time.time() - start
print(f"upload time: {upload_time}")
q.page['meta'] = ui.meta_card(box='')
q.page["meta"].redirect = f"http://localhost:10101/{url}"
await q.page.save()
-
wave run app
@srini-x I can't seem to assign you, please just assign yourself.
Looks like the httpx async client is slower than the regular client:
package main
import (
"io"
"log"
"net/http"
"os"
)
func upload(w http.ResponseWriter, r *http.Request) {
if err := r.ParseMultipartForm(32 << 20); err != nil { // 32 MB
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
form := r.MultipartForm
files, ok := form.File["files"]
if !ok {
http.Error(w, "no files", http.StatusBadRequest)
return
}
for _, file := range files {
src, err := file.Open()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
defer src.Close()
dst, err := os.OpenFile(file.Filename, os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
defer dst.Close()
io.Copy(dst, src)
}
}
func main() {
http.HandleFunc("/", upload)
log.Fatal(http.ListenAndServe(":8080", nil))
import httpx
import time
import os
files = ['data.dump']
print('uploading...')
start = time.time()
res = httpx.post('http://localhost:8080/', files=[('files', (os.path.basename(f), open(f, 'rb'))) for f in files])
print(f'upload time: {time.time() - start}, status: {res.status_code} {res.text}')
(venv) elp@studio py % python upload.py
uploading...
upload time: 8.068389892578125, status: 200
import asyncio
import httpx
import time
import os
files = ['data.dump']
async def main():
async with httpx.AsyncClient() as client:
print('uploading...')
start = time.time()
res = await client.post('http://localhost:8080/',
files=[('files', (os.path.basename(f), open(f, 'rb'))) for f in files])
print(f'upload time: {time.time() - start}, status: {res.status_code} {res.text}')
asyncio.run(main())
(venv) elp@studio py % python upload.py
uploading...
upload time: 39.59935522079468, status: 200
Maybe related: https://github.com/encode/httpx/issues/838
Maybe related: encode/httpx#838
Looks like they are not eager to improve speed soon: https://github.com/encode/httpx/issues/838#issuecomment-598919546
FYI: workaround / alternate feature that might help circumventing this issue: https://github.com/h2oai/wave/blob/master/website/docs/files.md#serving-files-directly-from-the-wave-server
Will be out in the next release.
@lo5 looks good, will this also work in h2o-cloud?
@psinger Filed https://github.com/h2oai/h2o-ai-cloud/issues/1920
Closed: https://github.com/encode/httpx/pull/1948 https://github.com/encode/httpx/issues/838
Looks like they are not eager to improve speed soon:
We could move the files within the FS in if wave app and wave server are located on the same machine (which is very common anyway) and dodge pushing it through HTTP. Wdyt @lo5?