SpacetimeDB icon indicating copy to clipboard operation
SpacetimeDB copied to clipboard

Encountering some sort of race condition when rapidly inserting

Open flyaruu opened this issue 8 months ago • 4 comments

Issue

I'm running into a problem that the connection gets interrupted if I call an insert reducer.

I've created separate repo's for the server and client repo

System

MacBook Pro m4 Rust: 1.85.1 SpacetimeDB 1.0.1

Server repository:

https://github.com/flyaruu/superhero-server

Client repository:

https://github.com/flyaruu/superhero-client

To reproduce

  • Clone the server repo Build the server repo:
spacetime build

Start spacetimedb:

spacetime start --in-memory --enable-tracy

In a new shell, publish the spacetime module:

spacetime publish --project-path . superhero-server

Tail the spacetime logs:

spacetime logs superhero-server -f
  • Clone the client repo (in the same parent folder as the server)
  • Enter the client folder, build and start the docker compose, containing the three databases
docker compose build
docker compose up
  • Now the three databases (only one of which we will be using) should be up and running
  • Ok, lets run the client:
cargo run

The logs seem to indicate something went wrong after the first insert:

2025-04-09T20:37:24.962599Z  INFO: src/lib.rs:21: Client connected!
2025-04-09T20:37:24.974388Z DEBUG: src/lib.rs:37: About to insert.....!
2025-04-09T20:37:24.974459Z  INFO: src/lib.rs:38: About to insert Chewbacca - 1 - 0
2025-04-09T20:37:24.974770Z  INFO: src/lib.rs:40: Hero added: Chewbacca id: 1 space_id: 1
2025-04-09T20:37:24.975253Z  INFO: src/lib.rs:27: Disconnected client!!

The client seems oblivous, and just keeps going:

Pool created
Hero: Chewbacca inserted. Id: 1
Hero: Sailor Saturn inserted. Id: 51
Hero: Biscuit inserted. Id: 101
...
...etc

Adding a 1 ms sleep statement in the loop resolves the issue. If I add a very small (few micros) sleep, I get a few rows, maybe 5. So I'm pretty sure there is some kind of race condition.

flyaruu avatar Apr 09 '25 21:04 flyaruu

Hm, tried some other things.

  • Using the async run seems to make no difference.
  • Running in release mode yields a few more inserts, it manages to do a few more inserts, I've seen between 3-8 inserts until the same thing happens.

flyaruu avatar Apr 10 '25 11:04 flyaruu

It seems that you have unintentionally constructed something like a DOS attack here. You're pushing requests as fast as possible, without waiting for responses. The server is sending responses, but your client isn't processing them in time, so its TCP queue is filling up. All that time, it's continuing to send requests. SpacetimeDB eventually notices that your client is a bad actor and disconnects it. This disconnect is abrupt, forceful, and does not include sending a closing frame, so the client SDK doesn't notice immediately.

Your 1ms sleep ("rate limiting") is giving your client time to process the responses, so that it no longer allows a huge TCP queue to pile up, so the host doesn't get mad and doesn't forcibly disconnect it. Among the wide variety of more useful things you could do are:

  • Send a fixed and small number of requests to start with, e.g. 1000, and then delay each subsequent request until you get a response for a previous request. This might look like:
    ctx.reducers.on_add_hero(|ctx, _hero| {
        ctx.db.add_hero(get_next_hero()).unwrap();
    });
    for _ in 0..1000 {
        ctx.db.add_hero(get_next_hero()).unwrap();
    }
    
  • Batch up your inserts, so that you make fewer, larger reducer calls.
  • Not try to run artificially large workloads against Maincloud.

gefjon avatar Apr 10 '25 17:04 gefjon

Ok, figured it out, it wasn't quite that.

The dataset is pretty small to begin with (100 rows) so hardly a DOS, but the process just quits to quickly, closing the connection.

If I do a decent wait at the end (before the main function exits) it works.

Is there a way to do a graceful shutdown and wait for all the queued messages to flush?

Or should we consider a custom drop implementation for the DbConnection? It would be nice to know for sure that all messages have been flushed.

flyaruu avatar Apr 11 '25 12:04 flyaruu

I think this isn't a big deal any more, I'll close it, making some space for more important things.

flyaruu avatar Apr 14 '25 07:04 flyaruu