temporal icon indicating copy to clipboard operation
temporal copied to clipboard

No workflows found upon server restart when using SQLite persistence (--db-filename)

Open aidanSoles opened this issue 5 months ago • 1 comments

NOTE: The below is an issue I've been hitting when trying to use the Temporal dev server (with SQLite persistence) for local dev. This could be a PEBKAC, but I wanted to create this issue in the case that it isn't.

Background

I'm using the Temporal dev server with SQLite persistence (temporal server start-dev --db-filename ...) to test a feature for a tool that's backed by Temporal. For my local environment, the stack consists of:

  • A FastAPI process, that fronts the Temporal server via the Python SDK.
  • The Temporal dev server with SQLite persistence (temporal server start-dev --db-filename ...).
  • A worker that, once again, uses the Python SDK.

I wanted to create a SQLite DB to use for a test, so I did the following:

  1. Wrote a test script that made 512 requests (with small sleeps in between each request) with real test data to the FastAPI server (which would result in 512 workflows being run).
  2. Started FastAPI, the Temporal dev server (with SQLite persistence), and the worker process.
  3. Ran the test script from step 1.
  4. Let Temporal/the worker process finish running all the workflows.
  5. Stopped the FastAPI server, the Temporal dev server, and the worker.
  6. Committed the resulting .db file from the test run (via Git LFS).

Expected Behavior

I expected that when I started the server up again with the temporal server start-dev --db-filename ... command, the my workflow history would be there.

Actual Behavior

When I started the server up again with the temporal server start-dev --db-filename ... command, the my workflow history was totally gone.

Steps to Reproduce the Problem

  1. Open the SQLite DB and inspect the count in the history_tree table. DISCLAIMER: I am not sure if this is the right way to prove that the event history was deleted.
  2. Run the Temporal dev server while passing the existing SQLite DB to the --db-filename argument.
  3. Open the SQLite DB and inspect the count in the history_tree table. The count in the history_tree table should have almost entirely reset.

I ran the above process and the output is as follows:

$ sqlite3 test.db
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> select count(*) from history_tree;
526
sqlite> ^D

$ temporal server start-dev --log-level=error --db-filename test.db # Running this with error-level debugging to reduce noise.
CLI 1.3.0 (Server 1.27.1, UI 2.36.0)

Server:  localhost:7233
UI:      http://localhost:8233
Metrics: http://localhost:51209/metrics
^CStopping server...

# aidansoles @ MC-JNQ2XR2RK7 in ~/dev/bowser/tests/test_dbs on git:ads/cron-worker-0 x [13:34:10]
$ sqlite3 test.db
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> select count(*) from history_tree;
4

As you can see from the above, the count went from 526 to 4 just by starting the server.

Specifications

  • Version:
$ temporal --version
temporal version 1.3.0 (Server 1.27.1, UI 2.36.0)
  • Platform:
$ uname -a
Darwin MC-JNQ2XR2RK7 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:29 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6030 arm64

aidanSoles avatar Jun 13 '25 20:06 aidanSoles

Hey @aidanSoles, seems to work for me. Could you run it with --log-level debug and send in your logs, either here on the Github issue, or in your support ticket?

kevinawoo avatar Jun 13 '25 23:06 kevinawoo

I cannot reproduce this (using the latest CLI - 1.4.0) but there are no known differences since 1.3.0. The current theory is that retention timers were triggered on the workflows stored in the DB. I would also recommend using the temporal CLI's workflow list command or using the UI to inspect the system instead of looking at the internal DB state.

Please reopen if you manage to provide a reliable reproduction.

bergundy avatar Jul 10 '25 22:07 bergundy