rexie icon indicating copy to clipboard operation
rexie copied to clipboard

Faster bulk insertion

Open ShaddyDC opened this issue 2 years ago • 14 comments

I'm using rexie in a project to insert a large number of entries into a database, on the order of 300k. As this takes its sweet time, especially on mobile devices, I'm looking into ways to speed it up. I've seen that you can get a significant speedup by not subscribing to success events in add or put calls.

I have done a quick and dirty test on the potential speedup in rexie, and it may be as large as 50% or more.

Benchmark implementation outline

I'm not sure how to go about benchmarking rust in a browser properly. I have simply added 2 simple tests and verified the correct data was present afterwards.

// Add to tests/web.rs

async fn setup(test_name: &str) -> Rexie {
  Rexie::delete(test_name).await.unwrap();

  let rexie = Rexie::builder(test_name)
      .version(1)
      .add_object_store(ObjectStore::new("nums").key_path("id").auto_increment(true))
      .build()
      .await
      .unwrap();

  rexie
}

#[derive(Serialize)]
struct EmptyStruct {}

const ITERATIONS: u32 = 10_000;

#[wasm_bindgen_test]
async fn bench_simple() {
  let test_name = "bench_simple";
  let rexie = setup(test_name).await;

  let transaction = rexie
      .transaction(&["nums"], TransactionMode::ReadWrite)
      .unwrap();

  let nums = transaction.store("nums").unwrap();

  let obj = EmptyStruct {};
  let obj = serde_wasm_bindgen::to_value(&obj).unwrap();

  web_sys::console::time_with_label(test_name);

  for _ in 0..ITERATIONS {
      nums.add(&obj, None).await.unwrap();
  }

  transaction.commit().await.unwrap();

  web_sys::console::time_end_with_label(test_name);
}

#[wasm_bindgen_test]
async fn bench_bulk() {
  let test_name = "bench_bulk";
  let rexie = setup(test_name).await;

  let transaction = rexie
      .transaction(&["nums"], TransactionMode::ReadWrite)
      .unwrap();

  let nums = transaction.store("nums").unwrap();

  let obj = EmptyStruct {};
  let obj = serde_wasm_bindgen::to_value(&obj).unwrap();

  web_sys::console::time_with_label(test_name);

  for _ in 0..ITERATIONS {
      nums.add_bulk(&obj, None).await.unwrap();
  }

  transaction.commit().await.unwrap();

  web_sys::console::time_end_with_label(test_name);
}
// src/transaction/store.rs impl Store

    pub async fn add_bulk(&self, value: &JsValue, key: Option<&JsValue>) -> Result<JsValue> {
        let request = match key {
            Some(key) => self.idb_store.add_with_key(value, key),
            None => self.idb_store.add(value),
        }
        .map_err(Error::IndexedDbRequestError)?;

        wait_request_bulk(request, Error::IndexedDbRequestError).await
    }
// src/request.rs

pub async fn wait_request_bulk<R: DbRequest + Unpin>(
    request: R,
    map_err: fn(JsValue) -> Error,
) -> crate::Result<JsValue> {
    RequestFuture::new_bulk(request, map_err).await
}

impl<R> RequestFuture<R>
where
    R: DbRequest + Unpin,
{
    pub fn new_bulk(request: R, map_err: fn(JsValue) -> Error) -> Self {
        let (sender, receiver) = mpsc::unbounded_channel();

        sender.send(Ok(JsValue::null())).unwrap();

        let success_closure = get_success_closure(sender.clone());
        let error_closure = get_error_closure(sender);

        // request.on_success(Some(get_callback(&success_closure)));
        request.on_error(Some(get_callback(&error_closure)));

        Self {
            _inner: request,
            _success_closure: success_closure,
            _error_closure: error_closure,
            receiver,
            map_err,
        }
    }
}

Numbers

Chrome:

bench_bulk: 1153.651123046875 ms
bench_simple: 2531.212158203125 ms

Firefox:

bench_bulk: 443.04ms
bench_simple: 1957.06ms

The implementation outlined above is obviously hacky and dirty, and I'm not sure what the best API for it would be. It would be something I'd quite like to see, however.

ShaddyDC avatar Dec 10 '22 23:12 ShaddyDC

Hi. Another option is to not await the future returned by add in a loop. Instead, storing futures in a Vec and use something like join_all to await for all the futures at once. Will that help?

devashishdxt avatar Jan 31 '23 06:01 devashishdxt

It's a good idea I hadn't considered. Using join_all from the futures crate like this:

futures::future::join_all(
    (0..ITERATIONS)
        .map(|_| nums.add(&obj, None))
        .collect::<Vec<_>>(),
)
.await
.into_iter()
.collect::<Result<Vec<_>>>()
.unwrap();

I get these timings on chrome

bench_bulk: 657.696044921875 ms
bench_join_all: 992.5927734375 ms
bench_simple: 1650.171142578125 ms

And these on firefox:

bench_bulk: 275.3ms
bench_join_all: 656.92ms
bench_simple: 991.52ms

Timings are better than the previous numbers because I realised I wasn't running in release mode.

So, using join_all is much better than awaiting each element separately, but it is, for this test, still significantly slower than skipping those awaits altogether.

ShaddyDC avatar Feb 01 '23 11:02 ShaddyDC

The way I understand this is that you're not waiting for value to be inserted in the database before yielding. You can have the same effect in current code by not .awaiting the future returned by add. It will immediately return without waiting for success. Intuitively, immediately returning can be more faster than returning null from async call.

You'll get a compiler warning by not awaiting the future but you can ignore that because browser runtime automatically schedules the task (rust runtime needs future to be polled in order to move forward as opposed to browser runtimes).

devashishdxt avatar Feb 02 '23 00:02 devashishdxt

Hmm, I've tried to do it like this, and it finishes pretty much instantly and leaves the database empty, so it doesn't seem to work.

 (0..ITERATIONS).for_each(|_| {
     nums.add(&obj, None);
 });

ShaddyDC avatar Feb 02 '23 16:02 ShaddyDC

I would also like to see this improvement implemented, preferably with a function that takes an iterator. Something like this:

pub async fn add_iter<I>(iter: I) -> Result<JsValue>
where
    I: IntoIterator<Item = (JsValue, Option<JsValue>)>,
{
    // Loop over the elements in the iterator
}

Then it could iterate over the provided data and attach an on_success callback only to the operation on the last element produced by the iterator. It would almost be like calling add_bulk on all the data except for the last one that would use add for that one on_success callback to trigger a resolution to the Future.

You can't really undo the effect of setting a bunch of on_success callbacks, so the best approach is to not set them in the first place.

FlippingBinary avatar Oct 20 '23 05:10 FlippingBinary

@FlippingBinary and @ShaddyDC. I'll look into it over the weekend. I anyway think that the API needs some restructuring for more ease-of-use.

devashishdxt avatar Oct 20 '23 09:10 devashishdxt

@devashishdxt I would love to be a part of that discussion, if you're open to it. This crate has quickly become important to a research project I'm working on. I don't want to write a PR without knowing in advance you'd be receptive to the approach, so I'll wait to hear more about your thoughts on restructuring.

FlippingBinary avatar Oct 20 '23 13:10 FlippingBinary

@FlippingBinary I've experimented with a proc macro based API in deli. I'm still not entirely convinced with the proc macro based API as I think it is too opinionated. It'd be great if you can take a look and provide feedback. For experimentation, you are most welcome to create PR on either rexie or deli. Thanks for your interest!

devashishdxt avatar Oct 21 '23 00:10 devashishdxt

Unfortunately, deli would have the same performance issue on bulk inserts because it uses idb, which sets the on_success callback on every request. It looks like you own all the IndexedDB crates, lol. I'll have to put some thought into which one I'd rather contribute to. I know it wouldn't be deli because I have to prioritize the performance boost described above, but I do like the usage of macros for describing the models. Have you considered integrating some features of deli with rexie? I'm not sure if it's your intention, but rexie seems to be most appropriate for a "batteries included" type of crate.

FlippingBinary avatar Oct 21 '23 07:10 FlippingBinary

Solving the bulk insert part is not that difficult. You can collect all the futures in a vec and join them in one go. I think that'll be much faster. I can try adding a function which internally does this.

devashishdxt avatar Oct 21 '23 07:10 devashishdxt

That's already what I do in my project. I hope I can get better performance out of it! According to the StackOverflow answer, the JS runtime causes the performance problem when the on_success callback is set on a large number of add/put operations. Both rexie and idb set it on every request. It's how the Future is resolved. Instead of setting it on every request, it needs to set it on only the last request in a series. Then it should return an error if any of the error handlers are triggered, or return success if the only success handler fires.

You can see what I'm describing in Dexie's mutate function: https://github.com/dexie/Dexie.js/blob/cb1793a40f9a7cf02010f0cd134831e5523e007f/src/dbcore/dbcore-indexeddb.ts#L115

FlippingBinary avatar Oct 21 '23 07:10 FlippingBinary

I've made some changes in latest version of idb crate which should enable faster bulk insertion. I'll soon port those changes to rexie too.

devashishdxt avatar Jan 18 '24 11:01 devashishdxt

@devashishdxt thank you for your work on these crates. Will Deli get the bulk insertion performance improvements applied from idb?

Innominus avatar Jan 19 '24 12:01 Innominus

@devashishdxt thank you for your work on these crates. Will Deli get the bulk insertion performance improvements applied from idb?

Not yet. After I upgrade the dependencies and release a new version. I'll release a new version of deli soon.

devashishdxt avatar Jan 19 '24 12:01 devashishdxt

I've created a PR for this #46. Unfortunately, this PR only supports bulk insertion and updation on a single object store. If you want to perform bulk operations across object store, I'd suggest you use idb crate.

devashishdxt avatar Jul 04 '24 08:07 devashishdxt