docker icon indicating copy to clipboard operation
docker copied to clipboard

Very slow upgrade process

Open ibaraki-douji opened this issue 6 months ago • 3 comments

Issue

When trying to upgrade the nextcloud version, i saw that the upgrade was stuck for ~30m before continuing to process. (it was already happening before but only stuck for 5m)

Currently, i have no idea what is causing this issue, so if someone already have the same issue, i could try to change config, and else, if someone could provide some commands i could run to see what could cause the issue on a next upgrade, that would be nice.

Files / Logs

Here is the init logs : (check the time diff between line 6 and 7)

2025-06-15T11:36:13.887214965Z Conf remoteip disabled.
2025-06-15T11:36:13.887277700Z To activate the new configuration, you need to run:
2025-06-15T11:36:13.887292103Z   service apache2 reload
2025-06-15T11:36:13.897353629Z Configuring Redis as session handler
2025-06-15T11:36:14.313096226Z Initializing nextcloud 31.0.6.2 ...
2025-06-15T11:36:14.313114123Z Upgrading nextcloud from 31.0.5.1 ...
2025-06-15T12:01:09.655694259Z => Searching for hook scripts (*.sh) to run, located in the folder "/docker-entrypoint-hooks.d/pre-upgrade"
2025-06-15T12:01:09.659895462Z ==> Skipped: the "pre-upgrade" folder is empty (or does not exist)
2025-06-15T12:01:33.358814697Z Nextcloud or one of the apps require upgrade - only a limited number of commands are available
2025-06-15T12:01:33.358851309Z You may use your browser or the occ upgrade command to do the upgrade
2025-06-15T12:01:33.767487087Z Setting log level to debug
2025-06-15T12:01:34.520048300Z Turned on maintenance mode
2025-06-15T12:01:38.139143538Z Updating database schema
2025-06-15T12:01:38.247622306Z Updated database
2025-06-15T12:01:55.975508497Z Starting code integrity check...
2025-06-15T12:04:32.869696894Z Finished code integrity check
2025-06-15T12:04:33.119839658Z Update successful
2025-06-15T12:04:33.261717198Z Turned off maintenance mode
2025-06-15T12:04:33.262340213Z Resetting log level
2025-06-15T12:04:35.705705010Z The following apps have been disabled:
2025-06-15T12:04:35.707650208Z => Searching for hook scripts (*.sh) to run, located in the folder "/docker-entrypoint-hooks.d/post-upgrade"
2025-06-15T12:04:35.708894322Z ==> Skipped: the "post-upgrade" folder is empty (or does not exist)
2025-06-15T12:04:35.708922769Z Initializing finished

compose/stack (i'm using docker swarm) :

services:
  db:
    image: mariadb:10.6
    restart: always
    command: --transaction-isolation=READ-COMMITTED --log-bin=binlog --binlog-format=ROW
    volumes:
      - db1:/var/lib/mysql
    environment:
      - MARIADB_RANDOM_ROOT_PASSWORD=1
      - MYSQL_PASSWORD=xx
      - MYSQL_DATABASE=xx
      - MYSQL_USER=xx
    deploy:
      replicas: 1
      placement:
        constraints: [node.labels.worker == true]
      resources:
        limits:
          cpus: '0.50'
          memory: 250M
        reservations:
          memory: 100M

  redis:
    image: redis
    restart: always
    volumes:
      - redis1:/var/lib/redis/data
    command: redis-server --requirepass xx
    deploy:
      replicas: 1
      placement:
        constraints: [node.labels.worker == true]
      resources:
        limits:
          cpus: '0.50'
          memory: 250M
        reservations:
          memory: 100M

  app:
    image: nextcloud
    restart: always
    networks:
      - default
      - traefik
      - ldap
    links:
      - db
      - redis
    volumes:
      - nextcloud1:/var/www/html
    environment:
      - MYSQL_PASSWORD=xx
      - MYSQL_DATABASE=xx
      - MYSQL_USER=xx
      - MYSQL_HOST=db
      - REDIS_HOST=redis
      - REDIS_HOST_PASSWORD=xx
      - APACHE_DISABLE_REWRITE_IP=1
      - PHP_MEMORY_LIMIT=2048M
    deploy:
      replicas: 1
      placement:
        constraints: [node.labels.worker == true]
      labels:
        - traefik.enable=true
        - .... others traefik labels
      
  cron:
    image: nextcloud
    restart: always
    networks:
      - default
      - ldap
    links:
      - db
      - redis
    volumes:
      - nextcloud1:/var/www/html
    entrypoint: /cron.sh
    environment:
      - MYSQL_PASSWORD=xx
      - MYSQL_DATABASE=xx
      - MYSQL_USER=xx
      - MYSQL_HOST=db
      - REDIS_HOST=redis
      - REDIS_HOST_PASSWORD=xx
    deploy:
      replicas: 1
      placement:
        constraints: [node.labels.worker == true]

networks:
  default:
  traefik:
    external: true
  ldap:
    external: true

# omit volumes, not really relevent as they are connected and working

server config :

{
    "system": {
        "default_phone_region": "FR",
        "htaccess.RewriteBase": "\/",
        "memcache.local": "\\OC\\Memcache\\APCu",
        "apps_paths": [
            {
                "path": "\/var\/www\/html\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/www\/html\/custom_apps",
                "url": "\/custom_apps",
                "writable": true
            }
        ],
        "upgrade.disable-web": true,
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "cloud.<domain>",
            "cloud.<server1>.<domain>",
            "cloud.<server2>.<domain>",
            "cloud.<server3>.<domain>"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "31.0.6.2",
        "overwrite.cli.url": "https:\/\/cloud.<domain>",
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***",
        "forwarded_for_headers": [
            "HTTP_X_FORWARDED_FOR"
        ],
        "overwriteprotocol": "https",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "ldapProviderFactory": "OCA\\User_LDAP\\LDAPProviderFactory",
        "defaultapp": "files,calendar",
        "maintenance": false,
        "filelocking.enabled": true,
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "password": "***REMOVED SENSITIVE VALUE***",
            "port": 6379
        },
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpmode": "smtp",
        "mail_sendmailmode": "smtp",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpsecure": "ssl",
        "mail_smtpport": "465",
        "mail_smtpauth": 1,
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "maintenance_window_start": 2,
        "app_install_overwrite": [
            "google_synchronization"
        ],
        "loglevel": 1
    }
}

ibaraki-douji avatar Jun 15 '25 12:06 ibaraki-douji

Hello. Can you test outside docker-swarm? Never had this problem using compose standalone.

henmohr avatar Jul 08 '25 17:07 henmohr

@henmohr it's not swarm the issue, i also had it with compose tho not for a full 30m.

there is two possible causes (and i think it's both) :

  • I'm using distributed storage (CephFS), which is slower than nvme (because network + all hosts don't have nvme but ssd)
  • After another upgrade i saw that some rsync processes where present when it was stuck so it might be related to ( #1904 )

ibaraki-douji avatar Jul 09 '25 15:07 ibaraki-douji

@henmohr and @ibaraki-douji

My suggested changes which might help

-> Pre-seed key folders only via Docker volumes:

Mount only critical paths and skip syncing major blocks of files:

volumes:

  • nextcloud_config:/var/www/html/config
  • nextcloud_data:/var/www/html/data
  • nextcloud_custom_apps:/var/www/html/custom_apps
  • nextcloud_themes:/var/www/html/themes

-> Temporarily switch /var/www/html to local SSD/NVMe during upgrades:

    - Use a local fast volume for /var/www/html to drastically speed up rsync, then revert to CephFS afterward.

->Optionally build a custom Nextcloud image that avoids rsync entirely:

     - Embed application files into the image and mount only dynamic directories. This avoids expensive syncing overhead each restart. Inspired by discussions in the Docker repo issue

Abhicodeitout avatar Aug 19 '25 05:08 Abhicodeitout