mosaic icon indicating copy to clipboard operation
mosaic copied to clipboard

Multi-tier Execution

Open domoritz opened this issue 1 year ago • 2 comments

Right now, Mosaic either executes queries in the browser or via remote requests. If the network connection has high latency, queries over Mosaic’s indexes can become too slow for analysis at the speed of thought. To overcome this issue, design and develop a hybrid/multi-tier execution for Mosaic where queries over Mosaic indexes can run locally even if the indexes have to be computed remotely because the data is too large. An extension of this project could automatically determine the most efficient distributed query plan similar to MotherDuck and VegaPlus.

domoritz avatar May 23 '24 14:05 domoritz

I think this would be very useful in conjunction with #398 about multi-table support. Using a star schema, we would probably have some of our dimension tables running client-side, but the big data living server-side.

derekperkins avatar May 29 '24 00:05 derekperkins

As an extension to this and with multi-db support in #399, I would hope that there is an ability to use different engines on the backend and the frontend. We're looking at using StarRocks on the backend, which supports multi-tiered storage from object store -> hot SSD, and hopefully from there to DuckDB WASM browser-side.

derekperkins avatar May 29 '24 01:05 derekperkins