lonboard icon indicating copy to clipboard operation
lonboard copied to clipboard

Send WebAssembly binary over Jupyter WebSocket

Open kylebarron opened this issue 1 year ago • 2 comments

We use Parquet as the internal format for data transfer. For reasons linked, Parquet is great. But to read Parquet on the client, we need to use a Wasm-based Parquet reader like my own https://github.com/kylebarron/parquet-wasm. Wasm-based libraries need a sidecar binary .wasm file, which is usually distributed separately.

We currently fetch this file from CDN, but for environments with a strong outbound firewall, a CDN may not be allowed. See https://github.com/developmentseed/lonboard/issues/457. To get around this, we serialize the gzipped Wasm content on the Anywidget model itself. We then decompress it on the client and pass it into Parquet-wasm's initializer.

Closes https://github.com/developmentseed/lonboard/issues/457

kylebarron avatar Apr 10 '24 18:04 kylebarron

one suggestion, you can use some indirection to create a model with just the static contents so that it is hoisted from each model instance:

import ipywidgets
import anywidget
import traitlets


class StaticAsset(ipywidgets.Widget):
    contents = traitlets.Any().tag(sync=True)

asset = StaticAsset(contents=b"hello, world")
    
class Widget(anywidget.AnyWidget):
    _esm = """
    async function load_asset(model, name) {
        let model_id = model.get(name).slice("IPY_MODEL_".length);
        let asset_model = await model.widget_manager.get_model(model_id);
        return asset_model.get("contents");
    }
    async function render({ model, el }) {
        let asset = await load_asset(model, "asset")
        el.innerText = new TextDecoder().decode(asset)
    }
    export default { render }
    """
    asset = traitlets.Any(asset).tag(sync=True, **ipywidgets.widget_serialization)
    

Widget()

Each widget instance will just have IPY_MODEL_xxxx, and reuse the asset in the front end. I am working on making anywidget hoist _esm and _css assets this way to avoid duplication in the HTML.

manzt avatar Apr 10 '24 18:04 manzt

It should be noted this is just a way to get the static assets into the front end, but derived objects (e.g., initialized parquet module) would need to be cached somewhere. Probably easiest to make some global for now, but would be more elegant if anywidget could provide a more ideomatic API.

Concretely,

import ipywidgets
import anywidget
import traitlets


class StaticAsset(ipywidgets.Widget):
    contents = traitlets.Any().tag(sync=True)

asset = StaticAsset(contents=b"hello, world")
    
class Widget(anywidget.AnyWidget):
    _esm = """
    async function load_asset(model, name) {
        let model_id = model.get(name).slice("IPY_MODEL_".length);
        let asset_model = await model.widget_manager.get_model(model_id);
        return asset_model.get("contents");
    }
    
    async function initialize({ model }) {
        if (!globalThis._TREVORS_DECODED_ASSET) {
            // cache this globally for all others....
            let asset = await load_asset(model, "asset");
            globalThis._TREVORS_DECODED_ASSET = new TextDecoder().decode(asset);
        }
    }
    
    async function render({ model, el }) {
        el.innerText = globalThis._TREVORS_DECODED_ASSET
    }
    export default { initialize, render }
    """
    asset = traitlets.Any(asset).tag(sync=True, **ipywidgets.widget_serialization)
    

Widget()

manzt avatar Apr 10 '24 18:04 manzt