rexie
rexie copied to clipboard
Faster bulk insertion
I'm using rexie in a project to insert a large number of entries into a database, on the order of 300k. As this takes its sweet time, especially on mobile devices, I'm looking into ways to speed it up. I've seen that you can get a significant speedup by not subscribing to success events in add
or put
calls.
I have done a quick and dirty test on the potential speedup in rexie, and it may be as large as 50% or more.
Benchmark implementation outline
I'm not sure how to go about benchmarking rust in a browser properly. I have simply added 2 simple tests and verified the correct data was present afterwards.
// Add to tests/web.rs
async fn setup(test_name: &str) -> Rexie {
Rexie::delete(test_name).await.unwrap();
let rexie = Rexie::builder(test_name)
.version(1)
.add_object_store(ObjectStore::new("nums").key_path("id").auto_increment(true))
.build()
.await
.unwrap();
rexie
}
#[derive(Serialize)]
struct EmptyStruct {}
const ITERATIONS: u32 = 10_000;
#[wasm_bindgen_test]
async fn bench_simple() {
let test_name = "bench_simple";
let rexie = setup(test_name).await;
let transaction = rexie
.transaction(&["nums"], TransactionMode::ReadWrite)
.unwrap();
let nums = transaction.store("nums").unwrap();
let obj = EmptyStruct {};
let obj = serde_wasm_bindgen::to_value(&obj).unwrap();
web_sys::console::time_with_label(test_name);
for _ in 0..ITERATIONS {
nums.add(&obj, None).await.unwrap();
}
transaction.commit().await.unwrap();
web_sys::console::time_end_with_label(test_name);
}
#[wasm_bindgen_test]
async fn bench_bulk() {
let test_name = "bench_bulk";
let rexie = setup(test_name).await;
let transaction = rexie
.transaction(&["nums"], TransactionMode::ReadWrite)
.unwrap();
let nums = transaction.store("nums").unwrap();
let obj = EmptyStruct {};
let obj = serde_wasm_bindgen::to_value(&obj).unwrap();
web_sys::console::time_with_label(test_name);
for _ in 0..ITERATIONS {
nums.add_bulk(&obj, None).await.unwrap();
}
transaction.commit().await.unwrap();
web_sys::console::time_end_with_label(test_name);
}
// src/transaction/store.rs impl Store
pub async fn add_bulk(&self, value: &JsValue, key: Option<&JsValue>) -> Result<JsValue> {
let request = match key {
Some(key) => self.idb_store.add_with_key(value, key),
None => self.idb_store.add(value),
}
.map_err(Error::IndexedDbRequestError)?;
wait_request_bulk(request, Error::IndexedDbRequestError).await
}
// src/request.rs
pub async fn wait_request_bulk<R: DbRequest + Unpin>(
request: R,
map_err: fn(JsValue) -> Error,
) -> crate::Result<JsValue> {
RequestFuture::new_bulk(request, map_err).await
}
impl<R> RequestFuture<R>
where
R: DbRequest + Unpin,
{
pub fn new_bulk(request: R, map_err: fn(JsValue) -> Error) -> Self {
let (sender, receiver) = mpsc::unbounded_channel();
sender.send(Ok(JsValue::null())).unwrap();
let success_closure = get_success_closure(sender.clone());
let error_closure = get_error_closure(sender);
// request.on_success(Some(get_callback(&success_closure)));
request.on_error(Some(get_callback(&error_closure)));
Self {
_inner: request,
_success_closure: success_closure,
_error_closure: error_closure,
receiver,
map_err,
}
}
}
Numbers
Chrome:
bench_bulk: 1153.651123046875 ms
bench_simple: 2531.212158203125 ms
Firefox:
bench_bulk: 443.04ms
bench_simple: 1957.06ms
The implementation outlined above is obviously hacky and dirty, and I'm not sure what the best API for it would be. It would be something I'd quite like to see, however.
Hi. Another option is to not await
the future returned by add
in a loop. Instead, storing futures in a Vec
and use something like join_all
to await for all the futures at once. Will that help?
It's a good idea I hadn't considered. Using join_all
from the futures
crate like this:
futures::future::join_all(
(0..ITERATIONS)
.map(|_| nums.add(&obj, None))
.collect::<Vec<_>>(),
)
.await
.into_iter()
.collect::<Result<Vec<_>>>()
.unwrap();
I get these timings on chrome
bench_bulk: 657.696044921875 ms
bench_join_all: 992.5927734375 ms
bench_simple: 1650.171142578125 ms
And these on firefox:
bench_bulk: 275.3ms
bench_join_all: 656.92ms
bench_simple: 991.52ms
Timings are better than the previous numbers because I realised I wasn't running in release mode.
So, using join_all
is much better than awaiting each element separately, but it is, for this test, still significantly slower than skipping those awaits altogether.
The way I understand this is that you're not waiting for value to be inserted in the database before yielding. You can have the same effect in current code by not .await
ing the future returned by add
. It will immediately return without waiting for success. Intuitively, immediately returning can be more faster than returning null
from async call.
You'll get a compiler warning by not awaiting the future but you can ignore that because browser runtime automatically schedules the task (rust runtime needs future to be polled in order to move forward as opposed to browser runtimes).
Hmm, I've tried to do it like this, and it finishes pretty much instantly and leaves the database empty, so it doesn't seem to work.
(0..ITERATIONS).for_each(|_| {
nums.add(&obj, None);
});
I would also like to see this improvement implemented, preferably with a function that takes an iterator. Something like this:
pub async fn add_iter<I>(iter: I) -> Result<JsValue>
where
I: IntoIterator<Item = (JsValue, Option<JsValue>)>,
{
// Loop over the elements in the iterator
}
Then it could iterate over the provided data and attach an on_success
callback only to the operation on the last element produced by the iterator. It would almost be like calling add_bulk
on all the data except for the last one that would use add
for that one on_success
callback to trigger a resolution to the Future.
You can't really undo the effect of setting a bunch of on_success
callbacks, so the best approach is to not set them in the first place.
@FlippingBinary and @ShaddyDC. I'll look into it over the weekend. I anyway think that the API needs some restructuring for more ease-of-use.
@devashishdxt I would love to be a part of that discussion, if you're open to it. This crate has quickly become important to a research project I'm working on. I don't want to write a PR without knowing in advance you'd be receptive to the approach, so I'll wait to hear more about your thoughts on restructuring.
@FlippingBinary I've experimented with a proc macro based API in deli
. I'm still not entirely convinced with the proc macro based API as I think it is too opinionated. It'd be great if you can take a look and provide feedback. For experimentation, you are most welcome to create PR on either rexie
or deli
. Thanks for your interest!
Unfortunately, deli
would have the same performance issue on bulk inserts because it uses idb
, which sets the on_success
callback on every request. It looks like you own all the IndexedDB crates, lol. I'll have to put some thought into which one I'd rather contribute to. I know it wouldn't be deli
because I have to prioritize the performance boost described above, but I do like the usage of macros for describing the models. Have you considered integrating some features of deli
with rexie
? I'm not sure if it's your intention, but rexie
seems to be most appropriate for a "batteries included" type of crate.
Solving the bulk insert part is not that difficult. You can collect all the futures in a vec and join them in one go. I think that'll be much faster. I can try adding a function which internally does this.
That's already what I do in my project. I hope I can get better performance out of it! According to the StackOverflow answer, the JS runtime causes the performance problem when the on_success
callback is set on a large number of add/put operations. Both rexie
and idb
set it on every request. It's how the Future is resolved. Instead of setting it on every request, it needs to set it on only the last request in a series. Then it should return an error if any of the error handlers are triggered, or return success if the only success handler fires.
You can see what I'm describing in Dexie
's mutate function: https://github.com/dexie/Dexie.js/blob/cb1793a40f9a7cf02010f0cd134831e5523e007f/src/dbcore/dbcore-indexeddb.ts#L115
I've made some changes in latest version of idb
crate which should enable faster bulk insertion. I'll soon port those changes to rexie
too.
@devashishdxt thank you for your work on these crates. Will Deli get the bulk insertion performance improvements applied from idb?
@devashishdxt thank you for your work on these crates. Will Deli get the bulk insertion performance improvements applied from idb?
Not yet. After I upgrade the dependencies and release a new version. I'll release a new version of deli
soon.
I've created a PR for this #46. Unfortunately, this PR only supports bulk insertion and updation on a single object store. If you want to perform bulk operations across object store, I'd suggest you use idb
crate.