dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Unexpectedly Large Database Disk Usage (24GB) vs Small Backup Size (19MB) - Space Mostly in /.dolt/temptf

Open lihh1992 opened this issue 2 months ago • 8 comments

Database Name: insight_metabase
Version: 1.75.0
After the application ran for about 10 days, the database file size was approximately 24G. After executing the dolt gc command, the database file size was reduced to about 14G.

#dolt version
dolt version 1.75.0
cd /db/dolt/insight_metabase

/db/dolt/insight_metabase#du -h -d 1
24G         ./.dolt
24G         .

After executing the dolt gc command:

/db/dolt/insight_metabase#dolt gc
/db/dolt/insight_metabase#du -h -d 1
14G         ./.dolt
14G         .

However, our actual data volume is not that large. When I used dolt backup, the backup file was very small, only about 19MB.

/db/dolt/insight_metabase#mkdir /db/dolt_bak/insight_metabase
/db/dolt/insight_metabase#dolt backup add local-backup file:///db/dolt_bak/insight_metabase
/db/dolt/insight_metabase#dolt backup sync local-backup
/db/dolt/insight_metabase#cd /db/dolt_bak/insight_metabase
/db/dolt_bak/insight_metabase# du -h -d 1
4.0K    ./oldgen
19M     .

Our database has only one branch and one commit.

/db/dolt/insight_metabase# dolt branch
* main

/db/dolt/insight_metabase# dolt log
commit g9efkgekkdj74773dclc3t6m35c3m4p1 (HEAD -> main)
Author: Dolt System Account <[email protected]>
Date:  Wed Oct 22 02:42:03 +0000 2025

        Initialize data repository

After investigation, we found that the folder /db/dolt/insight_metabase/.dolt/temptf occupies almost most of the space.

Our questions are:

  1. Why is the disk space usage so large?
  2. How can we release the occupied disk space?
  3. What can we do to reduce the rate of disk space usage?

The configuration file config.yaml we are using is as follows (the remote_url_template in remote_url_template is partially omitted):

data_dir: /db/dolt
cfg_dir: /etc/dolt/config
log_level: info
log_format: text
behavior:
  read_only: false
  autocommit: true
  disable_client_multi_statements: false
  dolt_transaction_commit: false
  event_scheduler: "ON"
  auto_gc_behavior:
    enable: true
    archive_level: 1
listener:
  host: 0.0.0.0
  port: 3306
  max_connections: 1000
  back_log: 50
  max_connections_timeout_millis: 60000
  read_timeout_millis: 28800000
  write_timeout_millis: 28800000
  tls_key: null
  tls_cert: null
  require_secure_transport: null
  allow_cleartext_passwords: null

max_logged_query_len: 0

# Advanced Configuration
metrics:
  labels: {}
  host: null
  port: -1

system_variables: {}

user_session_vars: []

jwks: []
cluster:
  standby_remotes:
    - name: standby
      remote_url_template: http://......:50051/{database}
  bootstrap_role: primary
  bootstrap_epoch: 1
  remotesapi:
    port: 50051

lihh1992 avatar Oct 22 '25 07:10 lihh1992

This is a weird one. I'm not sure why temptf would get so large. This is an @reltuk question.

timsehn avatar Oct 22 '25 17:10 timsehn

@lihh1992 Thank you for the great bug report. I can confirm that I can reproduce and that this is not behaving as expected. The temptf files are created as part of cluster replication and should be cleaned up after the replication action is completed.

I can confirm that I can reproduce this behavior locally. It seems to be an issue that was introduced in 1.75.0.

I am investigating currently and will hopefully have a root cause and PR out for a fix soon. In the meantime, you can clean up the temptf files by stopping and starting the server. If you run the server with the environment variable DOLT_ARCHIVE_PULL_STREAMER=0 set, Dolt will not leave these files behind after a successful push.

I will keep you posted on my investigation and the progress on the root cause fix.

reltuk avatar Oct 22 '25 22:10 reltuk

The following PR resolves the issue. I am working on getting it through CI and included in the next Dolt release:

https://github.com/dolthub/dolt/pull/9987

reltuk avatar Oct 23 '25 00:10 reltuk

The root cause fix was just released in https://github.com/dolthub/dolt/releases/tag/v1.76.2.

Thank you again for the bug report. Closing for now, but feel free to reopen if you need anything further here.

reltuk avatar Oct 23 '25 21:10 reltuk

@reltuk After the issue occurred, I rebuilt the database using backup data. The reconstructed database size was less than 100 MB, but now the disk usage of the database file has exceeded 5 GB.

Around 10:00 this morning, I upgraded Dolt to version 1.76.2.

/db/dolt/insight_metabase/.dolt/temptf# dolt version
dolt version 1.76.2
/db/dolt/insight_metabase# du -h -d 1
5.0G    ./.dolt
5.0G    .

However, the size of the replica database is only 27 MB:

/db/dolt/insight_metabase# du -h -d 1
27M     ./.dolt
27M     .

Most of the space is still occupied by the temptf directory:

/db/dolt/insight_metabase/.dolt/temptf# find . -type f | wc -l
435211

After running dolt gc, there was no significant reduction in disk usage.

My questions are:

  1. Why did dolt gc not clean up the data in temptf?
  2. How can I free up the occupied disk space?
  3. Will the disk usage continue to grow rapidly?

lihh1992 avatar Oct 24 '25 10:10 lihh1992

@lihh1992

Thank you for the update. I cannot reproduce the lingering temptf files when running with cluster replication on 1.76.2. Can you verify that the version of the running server is 1.76.2?

dolt sql -q 'select dolt_version()'

from within the dolt database directory, while the server is running. (The directory we want is the one that has the .dolt/sql-server.info file in it.)

To clean up the files, you can stop the running sql-server process and then just delete them.

$ # After stopping the running sql-server process
$ rm .dolt/temptf/*

Then you can start the sql-server process again.

You can perform a full GC by running call dolt_gc('--full'). This may take longer than a normal GC, but it will collect data which was previously reachable from a branch head but is no longer reachable at all. By default, all data reachable from a branch head at the time of a garbage collection is moved into the old gen and is not revisited in the future by non-full garbage collections.

I will leave this issue re-opened while we continue to investigate whether temptf files are still being incorrectly left behind by 1.76.2.

reltuk avatar Oct 24 '25 18:10 reltuk

The following is the command execution process and its results:

root@dolt-0:/var/lib/dolt# cd /db/dolt
root@dolt-0:/db/dolt# du -h -d 1
1.1M    ./insight_system
372M    ./insight_jobadmin
5.1G    ./insight_metabase
15G     ./.dolt
14G     ./.dolt_dropped_databases
34G     .
root@dolt-0:/db/dolt# cd .dolt
root@dolt-0:/db/dolt/.dolt# ls
sql-server.info  tmp
root@dolt-0:/db/dolt/.dolt# dolt sql -q 'select dolt_version()'
Warning: Unknown global config option 'sqlserver.cluster.dolt_cluster_role'. Use `dolt config --global --unset sqlserver.cluster.dolt_cluster_role` to remove.
Warning: Unknown global config option 'sqlserver.cluster.dolt_cluster_role_epoch'. Use `dolt config --global --unset sqlserver.cluster.dolt_cluster_role_epoch` to remove.
+----------------+
| dolt_version() |
+----------------+
| 1.76.2         |
+----------------+
root@dolt-0:/db/dolt# cd ../insight_metabase/
root@dolt-0:/db/dolt/insight_metabase# dolt sql -q 'select dolt_version()'
Warning: Unknown global config option 'sqlserver.cluster.dolt_cluster_role'. Use `dolt config --global --unset sqlserver.cluster.dolt_cluster_role` to remove.
Warning: Unknown global config option 'sqlserver.cluster.dolt_cluster_role_epoch'. Use `dolt config --global --unset sqlserver.cluster.dolt_cluster_role_epoch` to remove.
+----------------+
| dolt_version() |
+----------------+
| 1.76.2         |
+----------------+

root@dolt-0:/db/dolt/insight_metabase# dolt sql
Warning: Unknown global config option 'sqlserver.cluster.dolt_cluster_role'. Use `dolt config --global --unset sqlserver.cluster.dolt_cluster_role` to remove.
Warning: Unknown global config option 'sqlserver.cluster.dolt_cluster_role_epoch'. Use `dolt config --global --unset sqlserver.cluster.dolt_cluster_role_epoch` to remove.
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit. "\help" for help.
insight_metabase/main*> call dolt_gc('--full')
                     -> ;
+--------+
| status |
+--------+
| 0      |
+--------+
1 row in set (0.88 sec)


insight_metabase/main*> exit
Bye
root@dolt-0:/db/dolt/insight_metabase# du -h -d 1
5.0G    ./.dolt
5.0G    .

root@dolt-0:/db/dolt/insight_metabase# cat /etc/dolt/config/config.yaml
data_dir: /db/dolt
cfg_dir: /etc/dolt/config
log_level: info
log_format: text
behavior:
  read_only: false
  autocommit: true
  disable_client_multi_statements: false
  dolt_transaction_commit: false
  event_scheduler: "ON"
  auto_gc_behavior:
    enable: true
    archive_level: 1
listener:
  host: 0.0.0.0
  port: 6033
  max_connections: 1000
  back_log: 50
  max_connections_timeout_millis: 60000
  read_timeout_millis: 28800000
  write_timeout_millis: 28800000
  tls_key: null
  tls_cert: null
  require_secure_transport: null
  allow_cleartext_passwords: null

max_logged_query_len: 0

# Advanced Configuration
metrics:
  labels: {}
  host: null
  port: -1

system_variables: {}

user_session_vars: []

jwks: []
cluster:
  standby_remotes:
    - name: standby
      remote_url_template: http://dolt-1.dolt-headless.gitee.svc.cluster.local:50051/{database}
  bootstrap_role: primary
  bootstrap_epoch: 1
  remotesapi:
    port: 50051

Additionally, I have another question. In the config.yaml file, I have configured the data_dir as /db/dolt. Under the /db/dolt path, the size of the ./.dolt folder is 15GB. Is this normal? I only have the few databases shown in the commands.

lihh1992 avatar Oct 27 '25 03:10 lihh1992

@lihh1992 Thank you for your detailed response.

For the unexpectedly large /db/dolt/insight_metabase/.dolt directory, can you please confirm if the size is largely still the result of temptf files please? Is is possible to remove those files manually while the server is not running and to confirm that they do not come back?

The the presence of a large /db/dolt/.dolt directory, can you confirm whether the running the server thinks there is a database name dolt? Can you provide a listing of the files within that directory to see what they are? Running show databases would list all the reigstered databases with the server.

dolt sql-server has support for running from a top-level container directory, where all the databases are in subdirectories, like so, when running from /db/dolt:

/db/dolt/database_one/.dolt
/db/dolt/database_two/.dolt
/db/dolt/database_three/.dolt

but it also has support for exposing the current directory as a database, as if you run dolt sql-server from /db/dolt/database_one and it exposes the database as database_one with its storage files in /db/dolt/database_one/.dolt.

Depending on how the initial databases are created and how the server is launched, sometimes these two modes are even active at the same time. Which is why show databases and the listing of the files will be helpful.

Thank you for your continued help and thoughtful responses, -- Aaron

reltuk avatar Oct 27 '25 18:10 reltuk