EDMarketConnector icon indicating copy to clipboard operation
EDMarketConnector copied to clipboard

Change EDDN 'replay' to using an sqlite3 DB & otherwise improve

Open Athanasius opened this issue 2 years ago • 2 comments

  1. The current replay.json has to be entirely rewritten if any message is added or removed. That's just crap. Either it churns to disk or risks losing data until a flush().
  2. The current code explicitly never queues station data messages. The premise was "this is ephemeral data and another Cmdr will be along soon enough", but that's misguided.
  3. 'Replay' was only ever triggered by being docked. This is fine for a strict replay scenario, but not where initial sending of a message failed and it should be retried. Cmdrs out in the black, without 'delay sending until docked' would have long delays in anything else being sent.

So, a partial rewrite. At this time most of the functionality is working:

  1. There's a new class EDDNSender to hold the queue and actual transmission code.
  2. EDDNSender.add_message() will add the given message to the queue (sqlite3 DB).
  3. If configuration says so then EDDNSender.send_message_by_id(...) will immediately be called to attempt sending.
  4. If send is successful the message is removed from the queue.
  5. All the code that constructs messages is on a path that checks if the message should be sent, skipping all processing if not. There's an extra paranoia check in EDDN.send_message() which everything now goes through.
  6. The one thing left to write is a queue worker to attempt to flush the queue by sending each message.

Athanasius avatar Oct 05 '22 16:10 Athanasius

I do want to write at least some tests, given EDDNSender should be amenable to that.

Athanasius avatar Oct 05 '22 16:10 Athanasius

It doesn't need to be a thread worker, it can be a tk self.parent.after(...), as with the legacy code. This neatly side-steps issues with multi-threaded sqlite.

For the thread worker:

  1. As it's a thread it won't be able to directly update the EDMC status line safely. We can either decide that's not necessary (it causes "Sending data to EDDN..." with the "..." getting replaced by a countdown) or poke things around using tk events.
  2. It needs to be properly triggered when "now docked" is detected. On-foot might complicate this a little?
  3. It needs to also be triggered periodically but only for "failure last time" messages, not the delayed ones. Either it will need to know if we're still docked (again, Odyssey on-foot/taxi/in-person multicrew complicates this), or each message needs to be tagged as "delayed due to not docked" so that only the others will be attempted in the timer.

Athanasius avatar Oct 05 '22 16:10 Athanasius

OK, done with this for today. Next time:

  • [x] Check that all CAPI (commodity, shipyard, outftting, fcmaterials_capi) messages get correct header fields set, particulatly gameversion and gamebuild.
  • [x] Check that all Journal messages get correct header fields set.
  • [x] Check that Sqlite queued messages are 100% correct.
  • [x] Check that any actual legacy messages are still converted to have the correct header.
  • [x] Check that messages are correct when sent either immediately or via a retry/now docked.

Athanasius avatar Nov 22 '22 17:11 Athanasius

OK, I think this might finally be all set for review and wider testing.

At times I feel like I've split things into their own function a little too much, particularly the EDDN.send_message() not just being a direct all to EDDNSender.send_message(), but overall I think that separation makes sense.

Attempts to use a threaded worker just ran into issues with python sqlite3 not allowing cross-thread use of a database connection/cursor, and then when you use separate ones in the thread it can all too easily hit "database is locked". That can be either the worker hitting it, or the 'immediate send' code hitting it due to the worker currently running. So, I went back to the tkinter after() way of scheduling things.

Getting that scheduling to work in all cases then involved a little complication:

  1. reschedule boolean to the called function to indicate if it should re-schedule when done. This is so it can also be triggered when we detect we just docked, without ending up with multiple parallel calls/schedulings.
  2. But it also uses the after() scheduling after processing a single row, because a) we don't want to hammer the EDDN Gateway, so need a delay, and b) We can't just time.sleep() because this is now main thread and would lock up the UI if we did that. That then means that we need to not do the 'slow' rescheduling if we already did the 'fast' rescheduling in order to process the next row after only a small delay. But we do then want to do the 'slow' rescheduling if there wasn't actually another row to process.

There are also some details around getting the header fields set correctly. gameversion and gamebuild are by default what we saw in Fileheader, but not for CAPI-sourced data. They instead get a CAPI-<endpoint> string to make it clear. There's EDDN.standard_header() which returns the standard one, with the Fileheader strings set, unless you pass custom ones into it. For the CAPI cases this means those strings. In the catchall in EDDN.send_message() we then only use standard_header() if the whole header hasn't been set yet in the message. This guarantees we WILL set a proper header, but allows for special cases that need to do it differently (such as CAPI messages).

Currently the 'slow' scheduling is set to 5 minutes. This seems reasonable. There was no equivalent in the old code, as that had a scheme where any new message could trigger an attempt to send the replaylog.

The 'fast' rescheduling is set to 400ms, which matches the delay the old scheme used so as to not hammer the EDDN Gateway.

Athanasius avatar Nov 23 '22 14:11 Athanasius

I've just tested this code, with and without "Delay sending until docked" active against the live EDDN Gateway, and it all worked fine. A variety of schemas, CAPI sourced and Journal sourced. Queued messages sent on docking. Any message not delayed due to "Delay sending..." was sent immediately (outside of the timed queue run).

(Well, when I didn't run into the Relay "drop duplicates" functionality, having to go to a "not updated for 9 hours" station to be sure of things.)

Athanasius avatar Nov 24 '22 17:11 Athanasius