tile38 icon indicating copy to clipboard operation
tile38 copied to clipboard

Incorrect ordering of geofence events - Tile38 for historical analyses.

Open maxschub opened this issue 1 year ago • 11 comments

Hi,

I have a use case where I want to use my tile38 pipeline to also geofence historical data. In particular, there a about 100 geofences and 1 object that is moving in space with >100k locations a few seconds apart. When sending the locations sorted (by time) into tile38, then events are sent to a Kafka topic and polled by a consumer. When analysing the events in Kafka, they are not ordered in time anymore, thus entering and exiting a geofence are not logical.

Reproduction Set up 100 geofences (non overlapping) with DETECT inside Send 10k + locations for same "agent" e.g. await tile38.set('agents', agent_id).fields({"ts":ts }).point(lat, lon).exec() SUBSCRIBE to geofences. Analyse "ts" of each event against the previous event -> they are not monotonously increasing

When I add a time.sleep(0.001) between each SET, then tile38 seems to be completing the geofencing operations in time before the next SET du to the concurrent nature. However for the same agent I would expect them to happen in the same order as the SET operations.

Operating System

  • OS: Mac OS
  • CPU: Apple M1
  • Version: Ventura
  • Container: Docker

maxschub avatar Nov 02 '22 12:11 maxschub

Hey,

tile38 sends the events when the underlying spatial query resolves. meaning that if you have a hook whos query takes 50ms and the next event in line resolves in 10ms, then the faster one sends the event before the other - if I read the source correctly. @tidwall might correct me here.

how do you send the SETs? Maybe we can replica the issue. And how realistic is a scenario where a single agent sends events at almost the same time?

iwpnd avatar Nov 02 '22 13:11 iwpnd

Hi,

Here's a high level view of how Tile38 sends events.

Each SET command runs atomically.

For example, for the SET command:

> SET fleet truck POINT 33 -112

The server will effectively do:

server-lock          # lock server for one write command at a time
update-collection    # update the point in the fleet collection
evaluate-geofences   # see if any geofences are affected
queue-events         # write geofence events, if any, to the queue.db
signal-webhooks      # send webhook signals that new events are waiting
server-unlock        # unlock the server for the next write command

Each geofence webhooks, on the other hand, run asynchronously.

> SETHOOK geofence1 kafka://topic1 INTERSECTS fleet BOUNDS ...
server-lock
update-geofence-webook           # insert/update geofence1
start-geofence-webhook-manager   # runs in a background thread
server-unlock

Each webhook has is own background manager that is responsible for processing its own events.

All the manager does is wait for a signal that there are new events, and when there are, it sends them, in order, over its own network connection to its assigned endpoint.

for-loop 
  wait-for-signal   # wait for signal that there are pending events
  send-events       # send queued events over network 

In short:

Server-wide, all event are queued in order with monotonically increasing timestamps.
Per each geofence webhook, all events are sent over its own network connection, in order.
There's no guarantee of order between different webhooks, even if they share the same endpoint, because each webhook sends events independently in the background.

tidwall avatar Nov 02 '22 15:11 tidwall

When analysing the events in Kafka, they are not ordered in time anymore, thus entering and exiting a geofence are not logical.

The enter/exit ordering for a specific geofence should always be in order.

If this is not happening then I would like to look into it further, and reproduce on my side.

tidwall avatar Nov 02 '22 15:11 tidwall

@tidwall , have you been able to find a clue?

i am noticing similar problem even with only 1 or 2 fences

12:16:18 -send 3 SETs 13:12:55 -send 16 SETs 14:15:40 - receive event from one of the SET in above 3 SETs 14:17:03 - receive event from one of the SET in the 16 SETs

hooks are setup with detecting inside and outside i am sending the SET commands plus a GC at the end each time with Promise.all as described in auto-pipelining from https://github.com/redis/node-redis

the same set of SET commands are sent to 2 instances of tile38, 1 with 1 fence, another with 2 fences the delay are larger in the instance with 2 fences

both 2 instances of tile38 running on azure containerapp with 2cpu 4G memory

eduardotang avatar Nov 15 '22 07:11 eduardotang

Please correct me if I am wrong, but I am pretty sure Promise.all does not guarantee that commands resolve in order. It only attempts to send both at the same time, but due to transit, both can resolve at different points in time.

await Promise.all([
   tile38.set('agents', agent_id).fields({"ts":ts }).point(lat, lon).exec(),
   tile38.set('agents', agent_id).fields({"ts":ts + 1 }).point(lat + 1, lon + 1).exec(),
   tile38.gc()
])

The GC might resolve earlier than the SET and the second SET might resolve earlier than the first.

the same set of SET commands are sent to 2 instances of tile38, 1 with 1 fence, another with 2 fences the delay are larger in the instance with 2 fences

What do you mean here? Can you please provide a minimum viable reproducible code example?

iwpnd avatar Nov 16 '22 06:11 iwpnd

your code snippet was what i did but i actually haven't noticed ordering problem yet within 1 batch ( 1 call to Promise.all)

the problem was among different batches i.e. i got the inside/outside event from some SETs in a batch (earlier timestamp) later than another batch with a later timestamp (see below, SET in batch A event triggered after batch C , and apart from the ordering, there are diff delays.... ) batch C - 38mins, batch A - 2h42mins, batch B -1h37mins

12:13:19 - await Promise.all (multiple SETs) A 13:19:17 - await Promise.all (multiple SETs) B 14:01:25 - await Promise.all (multiple SETs) C

14:39:38 - receive event from above ( 1 SET from the batch at 14:01:25) C 14:55:06 - receive event from above ( 1 SET from the batch at 12:13:19) A 14:56:43 - receive event from above ( 1 SET from the batch at 13:19:17) B

for the 2 instances, i was just trying out to reproduce problem with different number of fences where i noticed the delay is different

eduardotang avatar Nov 16 '22 09:11 eduardotang

below is the code snippet

async function cmd_to_tile38(client, cmds) {
    await Promise.all ( cmds.map((c) => client.sendCommand(c)) )
}

async function process_points(.....) {
    const commands = []
    for ( const p of pts) {
      commands.push ( [
           'SET', 'key', 'id' , 'POINT', p.lat, p.lon,
      ])
    }
    await Promise.all( tile38clients.map( (t) => cmd_to_tile38( t, commands) ))
}

eduardotang avatar Nov 16 '22 09:11 eduardotang

Please correct me if I am wrong, but I am pretty sure Promise.all does not guarantee that commands resolve in order. It only attempts to send both at the same time, but due to transit, both can resolve at different points in time.

await Promise.all([
   tile38.set('agents', agent_id).fields({"ts":ts }).point(lat, lon).exec(),
   tile38.set('agents', agent_id).fields({"ts":ts + 1 }).point(lat + 1, lon + 1).exec(),
   tile38.gc()
])

The GC might resolve earlier than the SET and the second SET might resolve earlier than the first.

the same set of SET commands are sent to 2 instances of tile38, 1 with 1 fence, another with 2 fences the delay are larger in the instance with 2 fences

What do you mean here? Can you please provide a minimum viable reproducible code example?

and actually, when there is just a few IDs (means less latlon points), it did work properly, so i guess the auto-pipelining should have no problem..........

eduardotang avatar Nov 16 '22 10:11 eduardotang

I'm sorry I lost you. I cannot seem to properly understand what you do and what you expect to happen.

If you send 10 SET in a Promise.all you cannot expect Tile38 to send 10 inside events in the order of the timestamps in your SET. It will return 10 inside events in the order it receives the SET commands.

You are stacking promises here to the max, and I'm not surprised that this causes issues with bigger batches. If you're doing async maps, consider using bluebird.

I sense that this is more your applications problem, then Tile38.

iwpnd avatar Nov 16 '22 11:11 iwpnd

I just realized that auto pipelining from https://github.com/redis/node-redis is not that pipelining of redis anyways, i turn to use https://github.com/luin/ioredis which supports redis pipelining , the problem seem resolved..... thx

eduardotang avatar Nov 19 '22 01:11 eduardotang

Glad you found a solution @eduardotang 🙏

iwpnd avatar Nov 21 '22 15:11 iwpnd