edgedb-cli
edgedb-cli copied to clipboard
Nine to Ten Minute Wait After uvicorn app.main:fast_api Started Before Successful Responses
- EdgeDB CLI 2.0.5+f78cf34
- OS Version SLES12 SUSE Linux Enterprise Server 12 SP5 700G 36 core machine
A uvicorn process was started and httpx requests to it failed with a ReadTimeout After about ten minutes an httpx request was successful. It took one minute after uvicorn was started before the .../bin/postgres process started . It then then took nine more minutes before the other postgress processes started allowing, I assume for httpx requests to be successful. I doubt this behavior is normal.
o Are there system setups related to systemd or postgres that need to be done or modified? o Are there any logs or a verbose mode to allow for debugging either of these delay issues? o Is this a known issue?
<Executed on a very lightly loaded corporate SLES12 700G 36 core machine>
$ uvicorn app.main:fast_api --host 'localhost' --port 5000 --reload --log-level 'debug'
$ httpx -m GET http://localhost:5000/users
ReadTimeout: timed out
After about 10 minutes:
$ httpx -m GET http://localhost:5000/users
TTP/1.1 200 OK
date: Fri, 28 Oct 2022 21:57:40 GMT
server: uvicorn
content-length: 2
content-type: application/json
[]
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 6457 26335 41 26659 24444 29 14:43 pts/111 00:06:06 ...fastapi-crudOCT29/venv/bin/python3.10 ...fastapi-crudOCT29/venv/bin/uvicorn app.main:fast_api --host localhost --port 5000 --reload --log-level debug
nshelley 6458 6457 0 12009 10640 23 14:43 pts/111 00:00:00 ...fastapi-crudOCT29/venv/bin/python3.10 -c from multiprocessing.resource_tracker import main;main(4)
nshelley 6459 6457 0 52339 40156 21 14:43 pts/111 00:00:01 ...fastapi-crudOCT29/venv/bin/python3.10 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork
nshelley 6669 31114 0 63867 233684 5 14:44 ? 00:00:03 /~/.local/share/edgedb/portable/2.5/bin/python3 -I /~/.local/share/edgedb/portable/2.5/bin/edgedb-server.py --data-dir ~/.local/share/edgedb/data/fastapi_crudOCT29 --runstate-dir /run/user/12050573/edgedb-fasta httpx -m POST http://localhost:5000/events \
--json '{pi_crudOCT29 --port 10704 --compiler-pool-mode=on_demand --admin-ui=enabled --auto-shutdown-after=600
nshelley 13397 6669 0 283969 524036 15 14:53 ? 00:00:00 /~/.local/share/edgedb/portable/2.5/bin/postgres -D ~/.local/share/edgedb/data/fastapi_crudOCT29 -c listen_addresses= -c unix_socket_permissions=0700 -c unix_socket_directories=/run/user/12050573/edgedb-fastapi_crudOCT29 -c max_connections=7725 -c max_locks_per_transaction=256 -c log_min_messages=NOTICE -c log_statement=none -c log_line_prefix=
nshelley 13399 13397 0 284483 25364 29 14:53 ? 00:00:00 postgres: checkpointer
nshelley 13400 13397 0 284461 26324 5 14:53 ? 00:00:00 postgres: background writer
nshelley 13401 13397 0 283969 9268 14 14:53 ? 00:00:00 postgres: walwriter
nshelley 13402 13397 0 285523 8172 12 14:53 ? 00:00:00 postgres: autovacuum launcher
nshelley 13403 13397 0 16603 5272 2 14:53 ? 00:00:00 postgres: stats collector
nshelley 13404 13397 0 285492 6304 22 14:53 ? 00:00:00 postgres: logical replication launcher
nshelley 13434 13397 0 288543 58248 7 14:53 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 13452 6669 0 53070 192872 13 14:53 ? 00:00:01 /~/.local/share/edgedb/portable/2.5/bin/python3 -I -m edb.server.compiler_pool.worker --sockname /run/user/12050573/edgedb-fastapi_crudOCT29/_5erf7mc/ipc --version-serial 0
nshelley 17452 13397 0 293628 95984 0 14:57 ? 00:00:00 postgres: postgres V2f147ded60_edgedb [local] idle
nshelley 17459 13397 0 285741 25360 17 14:57 ? 00:00:00 postgres: postgres V2f147ded60_edgedb [local] idle
==== A few days previously a similar delay was noticed when doing an edgedb project init:
$ cd quickstart
$ time edgedb project init
No edgedb.toml found in .../work/EdgeDB/quickstart29OCT or above
Do you want to initialize a new project? [Y/n]
Y
Specify the name of EdgeDB instance to use with this project [default: quickstart29OCT]:
quickstart29OCT
Checking EdgeDB versions...
Specify the version of EdgeDB to use with this project [default: 2.5]:
2.5
Project directory \u2502 .../work/EdgeDB/quickstart29OCT
Project config \u2502 .../work/EdgeDB/quickstart29OCT/edgedb.toml
Schema dir (empty) \u2502 .../work/EdgeDB/quickstart29OCT/dbschema
Installation method \u2502 portable package
Version \u2502 2.5+57f7823
Instance name \u2502 quickstart29OCT
Version 2.5+57f7823 is already installed
Initializing EdgeDB instance...
[systemctl] Created symlink from ~/.config/systemd/user/default.target.wants/[email protected] to .../.config/systemd/user/[email protected].
[systemctl] Created symlink from ~/.config/systemd/user/default.target.wants/[email protected] to .../.config/systemd/user/[email protected].
Applying migrations...
Everything is up to date. Revision initial
Project initialized.
To connect to quickstart29OCT, run edgedb
real 9m28.811s
user 0m28.603s
sys 0m6.358s
A uvicorn process was started and httpx requests to it failed with a ReadTimeout
Does it start the EdgeDB server somehow? Or is it because of socket activation? If you restart uvicorn after it is already fast, is it fast immediately or still timeouted?
o Are there system setups related to systemd or postgres that need to be done or modified?
I'm not aware of any with these consequences (there are a lot of finetuning, but things should not be so slow).
o Are there any logs or a verbose mode to allow for debugging either of these delay issues?
You can run RUST_LOG=info edgedb project init or RUST_LOG=debug edgedb project init to get some more information about that command. Although, I think it's mostly server/postgres issue.
You can add --log-level=info to the ExecStart line in systemd file at ~/.config/systemd/user/[email protected]. That would put more logs in the journal. Read them with edgedb instance logs (or journalctl)
o Is this a known issue?
No.
Generally we don't have timeouts of the scale of 10 min anywhere I think (I mean all our timeouts are much shorter). But when query is running in edgedb we allow that query to proceed whatever time postgres allows it to be running. So this look much like something weird is going on with the postgres. Maybe your machine starts swapping due to memory limits, or fsyncs on the filesystem are slow because of some other load. But these are just wild guesses.
This is a simpler example using just the edgedb command that shows the nine minute wait issue. Initially only the systemd process was running. The edgedb command was run and failed to connect after a 30 second wait. It started an edgedb instance start process.
After a ten minute wait the edgedb command was run again and it connected. Notice how it took nine minutes for the postgres processes to get started from the time the edgedb instance start process started. Apparently the edgedb instance start process execed edgedb-server.py as it has the same PID and start time. The edgedb-server.py took nine minutes before it started the .../edgedb/portable/2.6/bin/postgres process.
The load average is 0.5 to 2.5 on this 700G 36 core machine SLES12 SUSE Linux Enterprise Server 12 SP5
$ ps -eaF | egrep -i 'edge|postg|corn|systemd' | grep nshelley | grep -v grep | tee ps0
nshelley 31114 1 0 12132 11368 2 Oct24 ? 00:03:44 /usr/lib/systemd/systemd --user
$ RUST_LOG=debug edgedb
...
[2022-11-16T18:15:46Z DEBUG edgedb::version_check] Cached version Some(Semver(Version { major: 2, minor: 2, patch: 0, build: BuildMetadata("57f6cad") }))
...
Connecting to an EdgeDB instance at localhost:10707...
...
edgedb error: ClientConnectionFailedError: cannot establish connection for 30s: timed out
$ ps -eaF | egrep -i 'edge|postg|corn|systemd' | grep nshelley | grep -v grep | tee ps1
nshelley 5459 31114 0 23879 10064 4 10:15 ? 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/bin/edgedb instance start quickstart8NOV_1776 --managed-by=systemd
nshelley 31114 1 0 12132 11368 6 Oct24 ? 00:03:45 /usr/lib/systemd/systemd --user
$ sleep 600; RUST_LOG=debug edgedb
...
edgedb>
$ ps -eaF | egrep -i 'edge|postg|corn|systemd' | grep nshelley | grep -v grep | tee ps2
nshelley 5459 31114 0 64113 235192 24 10:15 ? 00:00:03 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/edgedb-server.py --data-dir /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 --runstate-dir /run/user/12050573/edgedb-quickstart8NOV_1776 --port 10707 --compiler-pool-mode=on_demand --admin-ui=enabled --auto-shutdown-after=600
nshelley 27687 5459 0 284181 524048 5 10:24 ? 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/postgres -D /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 -c listen_addresses= -c unix_socket_permissions=0700 -c unix_socket_directories=/run/user/12050573/edgedb-quickstart8NOV_1776 -c max_connections=7725 -c max_locks_per_transaction=256 -c log_min_messages=NOTICE -c log_statement=none -c log_line_prefix=
nshelley 27780 27687 0 284695 25680 5 10:24 ? 00:00:00 postgres: checkpointer
nshelley 27781 27687 0 284673 22504 0 10:24 ? 00:00:00 postgres: background writer
nshelley 27782 27687 0 284181 9200 20 10:24 ? 00:00:00 postgres: walwriter
nshelley 27783 27687 0 285735 8128 17 10:24 ? 00:00:00 postgres: autovacuum launcher
nshelley 27784 27687 0 16801 5064 2 10:24 ? 00:00:00 postgres: stats collector
nshelley 27785 27687 0 285705 6320 19 10:24 ? 00:00:00 postgres: logical replication launcher
nshelley 27859 27687 0 288757 59052 1 10:24 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 27865 5459 0 55406 201984 12 10:24 ? 00:00:01 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I -m edb.server.compiler_pool.worker --sockname /run/user/12050573/edgedb-quickstart8NOV_1776/2z_5su8z/ipc --version-serial 0
nshelley 30327 16844 0 23776 8812 13 10:27 pts/28 00:00:00 edgedb
nshelley 31114 1 0 12132 11368 2 Oct24 ? 00:03:45 /usr/lib/systemd/systemd --user
Hm, 10 minutes is our --auto-shutdown-timeout. And edgedb instance start waits for grabbing lock file when another instance is running.
So I guess what happens is:
- Some instance is already running (but I'm not sure why you can't connect to it)
edgedb instance startwaits to grab the lock file- An instance from (1) get's no new connections so it automatically shuts down after 600 seconds
- Lock grab works and instance is started sucessfully
If you're doing something special, like sharing home directory between users, or maybe you can have two systemd --user daemons running at the same time (i.e. two sessions, does systemd work like that?), it's probably it.
Otherwise, I'll take a look if we can have some race condition somewhere in our startup code.
Edit: you can also confirm conflicting log by looking for Process is already running in:
edgedb instance logs -I quickstart8NOV_1776
>Some instance is already running
Wouldn't that instance have shown up in the ps -eaF ... command output shown above?
>haring home directory between users Home directories are not shared between users. Other users have no idea of edgedb. I'm the only one who is investigating it.
>you can have twosystemd --userdaemons running
Only one shows up in the ps -eaF ... command output shown above.
$ edgedb instance logs -I quickstart8NOV_1776Hint: You are currently not seeing messages from other users and the system.
Users in the 'systemd-journal' group can see all messages. Pass -q to
turn off this notice.
No journal files were opened due to insufficient permissions.
edgedb error: logs failed: exit status: 1 (command-line: "journalctl" "--user-unit" "edgedb-server@quickstart8NOV_1776.service")
Okay, I'll try to investigate more.
$ edgedb instance logs -I quickstart8NOV_1776Hint: You are currently not seeing messages from other users and the system. Users in the 'systemd-journal' group can see all messages. Pass -q to turn off this notice. No journal files were opened due to insufficient permissions. edgedb error: logs failed: exit status: 1 (command-line: "journalctl" "--user-unit" "edgedb-server@quickstart8NOV_1776.service")
Is there any combination of flags that allows you to see logs of your user's units? I can add -q to suppress warnings probably. But it's unclear why No journal files were opened due to insufficient permissions is there. Can you run journalctl with sudo? (although not sure about what flags are right for this). Or can you ask admin of the server how you can list logs of your user's services? It's possible that either journalctl was configured in the wrong way or there should be some flag to the command to read the logs.
Please try running the following, and show the output:
systemctl --user is-active "edgedb-server@quickstart8NOV_1776"
Here is what might be happening:
- You run
edgedb instance start, it can't detect thatsystemdif fine (basically the command above), presumably because of some permission error. So starts its own server by daemonizing. - Then you try to connect to the instance and actually functioning systemd tries to run and locks on the server lock started by (1).
If you can't figure to how to list logs, try also running it manually in foreground mode (before trying to connect):
edgedb instance start --foreground -I quickstart8NOV_1776
Or pretend that it's running from the systemd:
edgedb instance start --managed-by=systemd -I quickstart8NOV_1776
Either of the two should show you some logs from EdgeDB. Hopefully we'll see something interesting there.
sudo access for /usr/bin/journalctl was requested a while back. Still waiting on IT .
The last two commands mentioned were run. They both quickly started up the postgres processes.
edgedb was then run and as previously it failed to connect . After 9 minutes or so the postgress processes started and then running edgedb was successful.
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
$
$ edgedb instance start --foreground -I quickstart8NOV_1776
[2022-12-07T15:19:50Z WARN edgedb::version_check] Newer version of edgedb tool exists 2.2.6+b869667 (current 2.2.0+57f6cad)
INFO 747 2022-12-07T07:19:53.530 edb.server: starting EdgeDB server 2.6+ba1661b, built on 2022-11-03T02:50Z from revision f92efd40a (2022-11-03T00:02Z) for x86_64-unknown-linux-gnu
INFO 747 2022-12-07T07:19:53.530 edb.server: instance name: 'quickstart8NOV_1776'
INFO 747 2022-12-07T07:19:54.705 edb.server: Using 7723 max backend connections based on total memory.
INFO 792 2022-12-07T07:19:55.085 postgres: starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-bit
INFO 792 2022-12-07T07:19:55.086 postgres: listening on Unix socket "/run/user/12050573/edgedb-quickstart8NOV_1776/.s.PGSQL.5432"
INFO 792 2022-12-07T07:19:56.314 postgres: database system was shut down at 2022-12-06 13:26:11 PST
CRITICAL 792 2022-12-07T07:19:56.314 postgres: the database system is starting up
INFO 792 2022-12-07T07:19:56.364 postgres: database system is ready to accept connections
INFO 747 2022-12-07T07:19:58.790 edb.server: introspecting extensions for database '__edgedbsys__'
INFO 747 2022-12-07T07:19:58.790 edb.server: introspecting extensions for database 'edgedb'
INFO 747 2022-12-07T07:19:58.813 edb.server: Connection established to backend database: __edgedbsys__
INFO 747 2022-12-07T07:19:58.914 edb.server: Connection established to backend database: edgedb
INFO 747 2022-12-07T07:20:01.314 edb.server: started 1 compiler worker process
WARNING 747 2022-12-07T07:20:01.327 edb.server: could not create listen socket for '127.0.0.1:10707': [Errno 98] error while attempting to bind on address ('127.0.0.1', 10707): address already in use
WARNING 747 2022-12-07T07:20:01.327 edb.server: could not create listen socket for '::1:10707': [Errno 99] error while attempting to bind on address ('::1', 10707, 0, 0): cannot assign requested address
INFO 747 2022-12-07T07:20:01.327 edb.server: Serving admin on /run/user/12050573/edgedb-quickstart8NOV_1776/.s.EDGEDB.admin.10707
The postgres processes started up almost immediateley!
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 658 16844 0 24459 12488 0 07:19 pts/28 00:00:00 edgedb instance start --foreground -I quickstart8NOV_1776
nshelley 747 658 14 122026 233816 12 07:19 pts/28 00:00:02 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/edgedb-server.py --data-dir /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 --runstate-dir /run/user/12050573/edgedb-quickstart8NOV_1776 --port 10707 --compiler-pool-mode=on_demand --admin-ui=enabled
nshelley 792 747 1 284623 524380 16 07:19 pts/28 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/postgres -D /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 -c listen_addresses= -c unix_socket_permissions=0700 -c unix_socket_directories=/run/user/12050573/edgedb-quickstart8NOV_1776 -c max_connections=7725 -c max_locks_per_transaction=256 -c log_min_messages=NOTICE -c log_statement=none -c log_line_prefix=
nshelley 817 792 0 284623 3620 17 07:19 ? 00:00:00 postgres: checkpointer
nshelley 818 792 0 284623 3620 26 07:19 ? 00:00:00 postgres: background writer
nshelley 820 792 0 284623 3620 3 07:19 ? 00:00:00 postgres: walwriter
nshelley 821 792 0 286176 8132 6 07:19 ? 00:00:00 postgres: autovacuum launcher
nshelley 822 792 0 17242 5080 16 07:19 ? 00:00:00 postgres: stats collector
nshelley 823 792 0 286145 6408 8 07:19 ? 00:00:00 postgres: logical replication launcher
nshelley 842 792 0 289131 58660 16 07:19 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 847 792 0 286482 26412 0 07:19 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 850 792 0 286486 29084 3 07:19 ? 00:00:00 postgres: postgres V2f147ded60_edgedb [local] idle
nshelley 853 747 17 55150 202260 14 07:19 pts/28 00:00:01 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I -m edb.server.compiler_pool.worker --sockname /run/user/12050573/edgedb-quickstart8NOV_1776/4061jhdo/ipc --version-serial 0
$
Edgedb was killed with a Ctrl-C:
INFO 747 2022-12-07T07:21:58.856 edb.server: Connection discarded to backend database: __edgedbsys__
^C[2022-12-07T15:23:48Z WARN edgedb::process] Got interrupt. Waiting for the edgedb process to exit.
INFO 792 2022-12-07T07:23:48.220 postgres: received fast shutdown request
WARNING 747 2022-12-07T07:23:48.220 edb.server: Compiler worker process[853] exited unexpectedly; start a new one now.
ERROR 747 2022-12-07T07:23:48.232 edb.server: Connection to the system database is closed.
INFO 792 2022-12-07T07:23:48.233 postgres: aborting any active transactions
INFO 792 2022-12-07T07:23:48.233 postgres: terminating connection due to administrator command
INFO 792 2022-12-07T07:23:48.233 postgres: terminating connection due to administrator command
INFO 792 2022-12-07T07:23:48.233 postgres: background worker "logical replication launcher" (PID 823) exited with exit code 1
INFO 747 2022-12-07T07:23:48.233 edb.server: Received signal: Signals.SIGINT.
INFO 747 2022-12-07T07:23:48.233 edb.server: Shutting down.
INFO 792 2022-12-07T07:23:48.234 postgres: the database system is shutting down
INFO 792 2022-12-07T07:23:48.284 postgres: shutting down
INFO 792 2022-12-07T07:23:48.510 postgres: database system is shut down
Traceback (most recent call last):
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker.py", line 308, in <module>
worker_proc.main(get_handler)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker_proc.py", line 117, in main
run_worker(args.sockname, args.version_serial, get_handler)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker_proc.py", line 77, in run_worker
worker(sockname, version_serial, get_handler)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker_proc.py", line 42, in worker
con = amsg.WorkerConnection(sockname, version_serial)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/amsg.py", line 152, in __init__
self._sock.connect(sockname)
ConnectionRefusedError: [Errno 111] Connection refused
[2022-12-07T15:23:52Z WARN edgedb::interrupt] Exiting due to interrupt
==
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
$
$ edgedb instance start --managed-by=systemd -I quickstart8NOV_1776
[2022-12-07T15:27:37Z WARN edgedb::version_check] Newer version of edgedb tool exists 2.2.6+b869667 (current 2.2.0+57f6cad)
INFO 7747 2022-12-07T07:27:38.675 edb.server: starting EdgeDB server 2.6+ba1661b, built on 2022-11-03T02:50Z from revision f92efd40a (2022-11-03T00:02Z) for x86_64-unknown-linux-gnu
INFO 7747 2022-12-07T07:27:38.675 edb.server: instance name: 'quickstart8NOV_1776'
INFO 7747 2022-12-07T07:27:39.836 edb.server: Using 7723 max backend connections based on total memory.
INFO 7766 2022-12-07T07:27:40.155 postgres: starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11), 64-bit
INFO 7766 2022-12-07T07:27:40.155 postgres: listening on Unix socket "/run/user/12050573/edgedb-quickstart8NOV_1776/.s.PGSQL.5432"
INFO 7766 2022-12-07T07:27:40.210 postgres: database system was shut down at 2022-12-07 07:23:48 PST
INFO 7766 2022-12-07T07:27:40.223 postgres: database system is ready to accept connections
INFO 7747 2022-12-07T07:27:42.632 edb.server: introspecting extensions for database '__edgedbsys__'
INFO 7747 2022-12-07T07:27:42.632 edb.server: introspecting extensions for database 'edgedb'
INFO 7747 2022-12-07T07:27:42.658 edb.server: Connection established to backend database: __edgedbsys__
INFO 7747 2022-12-07T07:27:42.693 edb.server: Connection established to backend database: edgedb
INFO 7747 2022-12-07T07:27:44.882 edb.server: started 1 compiler worker process
WARNING 7747 2022-12-07T07:27:44.886 edb.server: could not create listen socket for '127.0.0.1:10707': [Errno 98] error while attempting to bind on address ('127.0.0.1', 10707): address already in use
WARNING 7747 2022-12-07T07:27:44.886 edb.server: could not create listen socket for '::1:10707': [Errno 99] error while attempting to bind on address ('::1', 10707, 0, 0): cannot assign requested address
INFO 7747 2022-12-07T07:27:44.886 edb.server: Serving admin on /run/user/12050573/edgedb-quickstart8NOV_1776/.s.EDGEDB.admin.10707
INFO 7747 2022-12-07T07:29:42.686 edb.server: Connection discarded to backend database: __edgedbsys__
INFO 7747 2022-12-07T07:31:42.715 edb.server: Connection discarded to backend database: edgedb
^CWARNING 7747 2022-12-07T07:36:17.167 edb.server: Compiler worker process[7901] exited unexpectedly; start a new one now.
INFO 7766 2022-12-07T07:36:17.179 postgres: received fast shutdown request
ERROR 7747 2022-12-07T07:36:17.179 edb.server: Connection to the system database is closed.
INFO 7766 2022-12-07T07:36:17.180 postgres: aborting any active transactions
INFO 7766 2022-12-07T07:36:17.180 postgres: terminating connection due to administrator command
INFO 7766 2022-12-07T07:36:17.180 postgres: background worker "logical replication launcher" (PID 7858) exited with exit code 1
INFO 7747 2022-12-07T07:36:17.180 edb.server: Received signal: Signals.SIGINT.
INFO 7747 2022-12-07T07:36:17.180 edb.server: Shutting down.
INFO 7766 2022-12-07T07:36:17.181 postgres: the database system is shutting down
INFO 7766 2022-12-07T07:36:17.239 postgres: shutting down
INFO 7766 2022-12-07T07:36:17.348 postgres: database system is shut down
Traceback (most recent call last):
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker.py", line 308, in <module>
worker_proc.main(get_handler)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker_proc.py", line 117, in main
run_worker(args.sockname, args.version_serial, get_handler)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker_proc.py", line 77, in run_worker
worker(sockname, version_serial, get_handler)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/worker_proc.py", line 42, in worker
con = amsg.WorkerConnection(sockname, version_serial)
File "/nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/lib/python3.10/site-packages/edb/server/compiler_pool/amsg.py", line 152, in __init__
self._sock.connect(sockname)
ConnectionRefusedError: [Errno 111] Connection refused
$
After starting and before Ctrl-C was sent
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 7747 16844 19 121038 231784 34 07:27 pts/28 00:00:02 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/edgedb-server.py --data-dir /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 --runstate-dir /run/user/12050573/edgedb-quickstart8NOV_1776 --port 10707 --compiler-pool-mode=on_demand --admin-ui=enabled --auto-shutdown-after=600
nshelley 7766 7747 2 284623 524380 33 07:27 pts/28 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/postgres -D /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 -c listen_addresses= -c unix_socket_permissions=0700 -c unix_socket_directories=/run/user/12050573/edgedb-quickstart8NOV_1776 -c max_connections=7725 -c max_locks_per_transaction=256 -c log_min_messages=NOTICE -c log_statement=none -c log_line_prefix=
nshelley 7853 7766 0 284623 3620 18 07:27 ? 00:00:00 postgres: checkpointer
nshelley 7854 7766 0 284623 3620 24 07:27 ? 00:00:00 postgres: background writer
nshelley 7855 7766 0 284623 3620 28 07:27 ? 00:00:00 postgres: walwriter
nshelley 7856 7766 0 286176 8132 29 07:27 ? 00:00:00 postgres: autovacuum launcher
nshelley 7857 7766 0 17242 5080 20 07:27 ? 00:00:00 postgres: stats collector
nshelley 7858 7766 0 286120 6408 21 07:27 ? 00:00:00 postgres: logical replication launcher
nshelley 7880 7766 0 289131 58420 20 07:27 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 7899 7766 0 286482 26336 21 07:27 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 7900 7766 0 286486 28908 27 07:27 ? 00:00:00 postgres: postgres V2f147ded60_edgedb [local] idle
nshelley 7901 7747 17 55152 202260 19 07:27 pts/28 00:00:01 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I -m edb.server.compiler_pool.worker --sockname /run/user/12050573/edgedb-quickstart8NOV_1776/4_aoat0w/ipc --version-serial 0
$
After Ctrl-C was sent:
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
$
==
$ RUST_LOG=debug edgedb
[2022-12-07T15:39:47Z DEBUG edgedb::cli] Options: Options {
conn_options: ConnectionOptions {
instance: None,
dsn: None,
credentials_file: None,
host: None,
port: None,
unix_path: None,
user: None,
database: None,
password: false,
no_password: false,
password_from_stdin: false,
tls_ca_file: None,
tls_verify_hostname: false,
no_tls_verify_hostname: false,
tls_security: None,
wait_until_available: None,
admin: false,
connect_timeout: None,
},
cloud_options: CloudOptions {
cloud_base_url: None,
cloud_access_token: None,
},
subcommand: None,
interactive: true,
debug_print_frames: false,
debug_print_descriptors: false,
debug_print_codecs: false,
output_format: None,
no_cli_update_check: false,
}
[2022-12-07T15:39:47Z DEBUG edgedb::version_check] Cached version Some(Semver(Version { major: 2, minor: 2, patch: 6, build: BuildMetadata("b869667") }))
[2022-12-07T15:39:47Z WARN edgedb::version_check] Newer version of edgedb tool exists 2.2.6+b869667 (current 2.2.0+57f6cad)
[2022-12-07T15:39:47Z INFO edgedb_client::builder] Connecting via TCP localhost:10707
[2022-12-07T15:39:47Z DEBUG rustls::client::hs] No cached session for DnsName(DnsName(DnsName("localhost")))
[2022-12-07T15:39:47Z DEBUG rustls::client::hs] Not resuming any session
[2022-12-07T15:39:47Z DEBUG rustyline] PathInfo("/nfs/site/home/nshelley/.local/share/edgedb/edgeql.history", SystemTime { tv_sec: 1666726143, tv_nsec: 579535000 }, 1)
Connecting to an EdgeDB instance at localhost:10707...
[2022-12-07T15:39:57Z DEBUG edgedb_client::builder] Temporary connection error: ClientConnectionFailedTemporarilyError: timed out
[2022-12-07T15:39:57Z DEBUG rustls::client::hs] No cached session for DnsName(DnsName(DnsName("localhost")))
[2022-12-07T15:39:57Z DEBUG rustls::client::hs] Not resuming any session
[2022-12-07T15:40:07Z DEBUG edgedb_client::builder] Temporary connection error: ClientConnectionFailedTemporarilyError: timed out
[2022-12-07T15:40:07Z DEBUG rustls::client::hs] No cached session for DnsName(DnsName(DnsName("localhost")))
[2022-12-07T15:40:07Z DEBUG rustls::client::hs] Not resuming any session
[2022-12-07T15:40:17Z DEBUG edgedb_client::builder] Temporary connection error: ClientConnectionFailedTemporarilyError: timed out
edgedb error: ClientConnectionFailedError: cannot establish connection for 30s: timed out
$ sleep 600; RUST_LOG=debug edgedb
[2022-12-07T15:50:28Z DEBUG edgedb::cli] Options: Options {
conn_options: ConnectionOptions {
instance: None,
dsn: None,
credentials_file: None,
host: None,
port: None,
unix_path: None,
user: None,
database: None,
password: false,
no_password: false,
password_from_stdin: false,
tls_ca_file: None,
tls_verify_hostname: false,
no_tls_verify_hostname: false,
tls_security: None,
wait_until_available: None,
admin: false,
connect_timeout: None,
},
cloud_options: CloudOptions {
cloud_base_url: None,
cloud_access_token: None,
},
subcommand: None,
interactive: true,
debug_print_frames: false,
debug_print_descriptors: false,
debug_print_codecs: false,
output_format: None,
no_cli_update_check: false,
}
[2022-12-07T15:50:28Z DEBUG edgedb::version_check] Cached version Some(Semver(Version { major: 2, minor: 2, patch: 6, build: BuildMetadata("b869667") }))
[2022-12-07T15:50:28Z WARN edgedb::version_check] Newer version of edgedb tool exists 2.2.6+b869667 (current 2.2.0+57f6cad)
[2022-12-07T15:50:28Z INFO edgedb_client::builder] Connecting via TCP localhost:10707
[2022-12-07T15:50:28Z DEBUG rustls::client::hs] No cached session for DnsName(DnsName(DnsName("localhost")))
[2022-12-07T15:50:28Z DEBUG rustls::client::hs] Not resuming any session
[2022-12-07T15:50:28Z DEBUG rustyline] PathInfo("/nfs/site/home/nshelley/.local/share/edgedb/edgeql.history", SystemTime { tv_sec: 1666726143, tv_nsec: 579535000 }, 1)
[2022-12-07T15:50:28Z DEBUG rustls::client::hs] Using ciphersuite TLS13_AES_256_GCM_SHA384
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 25241 31114 0 23879 10196 17 07:39 ? 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/bin/edgedb instance start quickstart8NOV_1776 --managed-by=systemd
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 25241 31114 0 63891 234588 18 07:39 ? 00:00:02 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/edgedb-server.py --data-dir /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 --runstate-dir /run/user/12050573/edgedb-quickstart8NOV_1776 --port 10707 --compiler-pool-mode=on_demand --admin-ui=enabled --auto-shutdown-after=600
nshelley 26370 25241 0 284181 524048 12 07:48 ? 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/postgres -D /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 -c listen_addresses= -c unix_socket_permissions=0700 -c unix_socket_directories=/run/user/12050573/edgedb-quickstart8NOV_1776 -c max_connections=7725 -c max_locks_per_transaction=256 -c log_min_messages=NOTICE -c log_statement=none -c log_line_prefix=
nshelley 26664 26370 0 284695 25316 3 07:48 ? 00:00:00 postgres: checkpointer
nshelley 26665 26370 0 284673 22504 27 07:48 ? 00:00:00 postgres: background writer
nshelley 26666 26370 0 284181 9200 34 07:48 ? 00:00:00 postgres: walwriter
nshelley 26667 26370 0 285735 8128 12 07:48 ? 00:00:00 postgres: autovacuum launcher
nshelley 26668 26370 0 16801 5064 3 07:48 ? 00:00:00 postgres: stats collector
nshelley 26669 26370 0 285705 6320 7 07:48 ? 00:00:00 postgres: logical replication launcher
nshelley 27229 26370 0 288757 58528 17 07:48 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 27232 25241 0 55406 201988 6 07:48 ? 00:00:01 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I -m edb.server.compiler_pool.worker --sockname /run/user/12050573/edgedb-quickstart8NOV_1776/gzh_c57k/ipc --version-serial 0
nshelley 34571 16844 0 23776 8580 5 07:50 pts/28 00:00:00 edgedb
Hm, I could catch this issue today. Although, I'm not sure how to reproduce it. And it looks like my hypothesis is right:
$ sudo lslocks --output-all
COMMAND PID TYPE SIZE MODE M START END PATH BLOCKER
cron 11304 FLOCK 6B WRITE 0 0 0 /run/crond.pid
edgedb 8308 FLOCK 7B WRITE* 0 0 0 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock 20108
(unknown) 20108 FLOCK WRITE 0 0 0 /run/user/1000...
The process that took lock is dead:
$ kill 20108
-bash: kill: (20108) - No such process
On the other hand, I see some lingering postgres process:
$ systemd-cgls
Control group /:
-.slice
├─user.slice
│ └─user-1000.slice
[... snip ...]
│ └─session-9540.scope
│ ├─20115 /home/admin/.local/share/edgedb/portable/2.5/bin/postgres -D /h...
│ ├─20117 postgres: checkpointer
│ ├─20118 postgres: background writer
│ ├─20119 postgres: walwriter
│ ├─20120 postgres: autovacuum launcher
│ ├─20121 postgres: stats collector
│ └─20122 postgres: logical replication launcher
Which both:
- Should not hold that lock (Fixed in edgedb/edgedb#4797)
- Should not be there in the first place (i.e. killing EdgeDB should always stop the postgres)
But this file descriptor somehow slips into postgres:
$ lsof | grep service.lock
edgedb 8308 admin 7u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
edgedb 8308 8309 edgedb admin 7u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
postgres 20115 admin 3u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
postgres 20117 admin 3u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
postgres 20118 admin 3u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
postgres 20119 admin 3u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
postgres 20120 admin 3u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
postgres 20121 admin 3u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
postgres 20122 admin 3u REG 0,40 7 16966830 /run/user/1000/edgedb-quickstart8NOV_1776/service.lock
I also see the postgres in ps -eF, but you don't. So can you try:
$ lslocks
And if there is a lock named something-/service.lock, try:
$ lsof | grep service.lock
And then also ps of that pids that you see in lsof if there are any. So we can find out which process still keeping that lock.
I now have privileges for systemctl
$ systemctl --user is-active "edgedb-server@quickstart8NOV_1776"
inactive
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 32781 31114 0 23883 10000 24 13:57 ? 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/bin/edgedb instance start quickstart8NOV_1776 --managed-by=systemd
As previously, this command fails:
$ RUST_LOG=debug edgedb
...
[2022-12-13T21:59:24Z DEBUG edgedb::version_check] Cached version Some(Semver(Version { major: 2, minor: 2, patch: 6, build: BuildMetadata("b869667") }))
...
edgedb error: ClientConnectionFailedError: cannot establish connection for 30s: timed out
$ systemctl --user is-active "edgedb-server@quickstart8NOV_1776"
activating
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 33279 31114 0 23880 9948 12 13:58 ? 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/bin/edgedb instance start quickstart8NOV_1776 --managed-by=systemd
$ lsof | grep service.lock
After about 10 minutes these processes showed up
$ ps -eaF | egrep -i 'edge|postg|corn' | grep -v grep
nshelley 2819 33279 0 284181 524048 2 14:07 ? 00:00:00 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/postgres -D /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 -c listen_addresses= -c unix_socket_permissions=0700 -c unix_socket_directories=/run/user/12050573/edgedb-quickstart8NOV_1776 -c max_connections=7725 -c max_locks_per_transaction=256 -c log_min_messages=NOTICE -c log_statement=none -c log_line_prefix=
nshelley 2821 2819 0 284181 3596 27 14:07 ? 00:00:00 postgres: checkpointer
nshelley 2822 2819 0 284673 22444 29 14:07 ? 00:00:00 postgres: background writer
nshelley 2823 2819 0 284181 9140 30 14:07 ? 00:00:00 postgres: walwriter
nshelley 2824 2819 0 285735 8128 17 14:07 ? 00:00:00 postgres: autovacuum launcher
nshelley 2825 2819 0 16801 5064 14 14:07 ? 00:00:00 postgres: stats collector
nshelley 2826 2819 0 285705 6320 25 14:07 ? 00:00:00 postgres: logical replication launcher
nshelley 2836 2819 0 288757 59320 26 14:07 ? 00:00:00 postgres: postgres V2f147ded60___edgedbsys__ [local] idle
nshelley 2846 2819 0 292507 93900 15 14:07 ? 00:00:00 postgres: postgres V2f147ded60_edgedb [local] idle
nshelley 2848 33279 0 55406 201980 25 14:07 ? 00:00:01 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I -m edb.server.compiler_pool.worker --sockname /run/user/12050573/edgedb-quickstart8NOV_1776/lmcwim93/ipc --version-serial 0
nshelley 6384 16844 0 23777 8624 1 14:10 pts/28 00:00:00 edgedb
nshelley 6426 2819 0 285953 25480 7 14:10 ? 00:00:00 postgres: postgres V2f147ded60_edgedb [local] idle
nshelley 6427 2819 0 285950 25480 5 14:10 ? 00:00:00 postgres: postgres V2f147ded60_edgedb [local] idle
nshelley 33279 31114 0 63160 232620 25 13:58 ? 00:00:02 /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/python3 -I /nfs/site/disks/dts_cltgkit_disk06/users/nshelley/.local/share/edgedb/portable/2.6/bin/edgedb-server.py --data-dir /nfs/site/home/nshelley/.local/share/edgedb/data/quickstart8NOV_1776 --runstate-dir /run/user/12050573/edgedb-quickstart8NOV_1776 --port 10707 --compiler-pool-mode=on_demand --admin-ui=enabled --auto-shutdown-after=600
$ systemctl --user is-active "edgedb-server@quickstart8NOV_1776"
active
$ lsof | grep service.lock
postgres 2819 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2821 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2822 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2823 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2824 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2825 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2826 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2836 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 2846 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 6426 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
postgres 6427 nshelley 6u REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
python3 33279 nshelley 6uW REG 0,135 7 2567520710 /run/user/12050573/edgedb-quickstart8NOV_1776/service.lock
This command now successfully connects:
$ RUST_LOG=debug edgedb
...
[2022-12-13T22:10:55Z DEBUG edgedb::version_check] Cached version Some(Semver(Version { major: 2, minor: 2, patch: 6, build: BuildMetadata("b869667") }))
...