Encountering some sort of race condition when rapidly inserting
Issue
I'm running into a problem that the connection gets interrupted if I call an insert reducer.
I've created separate repo's for the server and client repo
System
MacBook Pro m4 Rust: 1.85.1 SpacetimeDB 1.0.1
Server repository:
https://github.com/flyaruu/superhero-server
Client repository:
https://github.com/flyaruu/superhero-client
To reproduce
- Clone the server repo Build the server repo:
spacetime build
Start spacetimedb:
spacetime start --in-memory --enable-tracy
In a new shell, publish the spacetime module:
spacetime publish --project-path . superhero-server
Tail the spacetime logs:
spacetime logs superhero-server -f
- Clone the client repo (in the same parent folder as the server)
- Enter the client folder, build and start the docker compose, containing the three databases
docker compose build
docker compose up
- Now the three databases (only one of which we will be using) should be up and running
- Ok, lets run the client:
cargo run
The logs seem to indicate something went wrong after the first insert:
2025-04-09T20:37:24.962599Z INFO: src/lib.rs:21: Client connected!
2025-04-09T20:37:24.974388Z DEBUG: src/lib.rs:37: About to insert.....!
2025-04-09T20:37:24.974459Z INFO: src/lib.rs:38: About to insert Chewbacca - 1 - 0
2025-04-09T20:37:24.974770Z INFO: src/lib.rs:40: Hero added: Chewbacca id: 1 space_id: 1
2025-04-09T20:37:24.975253Z INFO: src/lib.rs:27: Disconnected client!!
The client seems oblivous, and just keeps going:
Pool created
Hero: Chewbacca inserted. Id: 1
Hero: Sailor Saturn inserted. Id: 51
Hero: Biscuit inserted. Id: 101
...
...etc
Adding a 1 ms sleep statement in the loop resolves the issue. If I add a very small (few micros) sleep, I get a few rows, maybe 5. So I'm pretty sure there is some kind of race condition.
Hm, tried some other things.
- Using the async run seems to make no difference.
- Running in release mode yields a few more inserts, it manages to do a few more inserts, I've seen between 3-8 inserts until the same thing happens.
It seems that you have unintentionally constructed something like a DOS attack here. You're pushing requests as fast as possible, without waiting for responses. The server is sending responses, but your client isn't processing them in time, so its TCP queue is filling up. All that time, it's continuing to send requests. SpacetimeDB eventually notices that your client is a bad actor and disconnects it. This disconnect is abrupt, forceful, and does not include sending a closing frame, so the client SDK doesn't notice immediately.
Your 1ms sleep ("rate limiting") is giving your client time to process the responses, so that it no longer allows a huge TCP queue to pile up, so the host doesn't get mad and doesn't forcibly disconnect it. Among the wide variety of more useful things you could do are:
- Send a fixed and small number of requests to start with, e.g. 1000, and then delay each subsequent request until you get a response for a previous request. This might look like:
ctx.reducers.on_add_hero(|ctx, _hero| { ctx.db.add_hero(get_next_hero()).unwrap(); }); for _ in 0..1000 { ctx.db.add_hero(get_next_hero()).unwrap(); } - Batch up your inserts, so that you make fewer, larger reducer calls.
- Not try to run artificially large workloads against Maincloud.
Ok, figured it out, it wasn't quite that.
The dataset is pretty small to begin with (100 rows) so hardly a DOS, but the process just quits to quickly, closing the connection.
If I do a decent wait at the end (before the main function exits) it works.
Is there a way to do a graceful shutdown and wait for all the queued messages to flush?
Or should we consider a custom drop implementation for the DbConnection? It would be nice to know for sure that all messages have been flushed.
I think this isn't a big deal any more, I'll close it, making some space for more important things.