PG_JOB_CACHE_DIR disallowed as mount (tmpfs backed)
Hi!
I'm investigating excessive load in a particular setup. Here we notice many many small writes (38 bytes) on this ZFS backed filesystem; which look like they cause a write amplification there. They appear to be in PG_JOB_CACHE_DIR.
I thought I'd mount tmpfs on that path -- or at least something with a lot less strong recovery guarantees -- but then I ran into this:
2025-11-26 16:39:32 UTC [45]: [1-1] 69272d44.2d 0 FATAL: could not remove file "base/pgsql_job_cache": Device or resource busy
2025-11-26 16:39:32 UTC [45]: [2-1] 69272d44.2d 0 LOG: database system is shut down
I traced that back to the citus startup, where rmdir/mkdir failures are fatal. This effectively makes putting the cache on a different filesystem impossible.
My initial minimal patch would look like this:
diff --git a/src/backend/distributed/utils/directory.c b/src/backend/distributed/utils/directory.c
index 6701bf8fb..fcf2745f2 100644
--- a/src/backend/distributed/utils/directory.c
+++ b/src/backend/distributed/utils/directory.c
@@ -32,7 +32,13 @@ CitusCreateDirectory(StringInfo directoryName)
int makeOK = MakePGDirectory(directoryName->data);
if (makeOK != 0)
{
- ereport(ERROR, (errcode_for_file_access(),
+ /*
+ * Don't raise an ERROR here. If we do, we cannot use a (bind)
+ * mount to move the job path to another filesystem (type).
+ * (Postgres treats ERRORs as fatal and aborts the current task.
+ * That also applies to the initialize task.)
+ */
+ ereport((errno == EEXIST ? WARNING : ERROR), (errcode_for_file_access(),
errmsg("could not create directory \"%s\": %m",
directoryName->data)));
}
@@ -147,7 +153,13 @@ CitusRemoveDirectory(const char *filename)
if (removed != 0 && errno != ENOENT)
{
- ereport(ERROR, (errcode_for_file_access(),
+ /*
+ * Don't raise an ERROR here. If we do, we cannot use a (bind)
+ * mount to move the job path to another filesystem (type).
+ * (Postgres treats ERRORs as fatal and aborts the current task.
+ * That also applies to the initialize task.)
+ */
+ ereport(WARNING, (errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", filename)));
}
Thoughts:
- this does affect all calls to CitusCreateDirectory, but that is only used for PG_JOB_CACHE_DIR, so not a problem;
- same for CitusRemoveDirectory;
- I don't like that CitusRemoveDirectory and CitusCreateDirectory have different function signatures, that's a bit smelly;
- for a better fix, we could check if the path is a mount and then skip everything, and/or replace the CitusRemoveDirectory+CitusCreateDirectory with a CitusClearDirectory that does everything except remove the parent;
- I'd also consider moving PG_JOB_CACHE_DIR to a configuration option, but the above fixes will still be needed.
Let me know what you think / how you would solve this.
Cheers, Walter Doekes OSSO B.V.
This effectively makes putting the cache on a different filesystem impossible.
Ok. Nothing is impossible. But it is kludgy: https://git.osso.nl/pub/docker/patroni-citus/-/commit/ed0eb2e2a3fae5cf0e99497ba7960b38de240960