docker icon indicating copy to clipboard operation
docker copied to clipboard

[Bug]: file is locked, existing lock on file: exclusive on big files upload

Open k-popov opened this issue 3 years ago • 48 comments

⚠️ This issue respects the following points: ⚠️

  • [X] This is a bug, not a question or a configuration/webserver/proxy issue.
  • [X] This issue is not already reported on Github (I've searched it).
  • [X] Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
  • [X] I agree to follow Nextcloud's Code of Conduct.

Bug description

Getting error "zeros_1gb.new.dd" is locked, existing lock on file: exclusive when uploading large files (1Gb+) with browser.

Setup:

  • NextCloud 24.0.1 (official docker image).
  • PHP is run by static FPM pool.
  • Running in kubernetes (official helm chart).
  • NextCloud system files (/var/www/html) stored in RWO volume, 10 pods use this volume on one node.
  • User files stored in S3-like storage
  • File locking set up in Redis (external Redis server).
  • filelocking.debug set to true

Logs viewed through Kibana show (pod_name - log_message, notice MOVE request, second lock attempt is done by different pod if it matters):

app-nextcloud-int-6f8b6d568-v9lng - NOTICE: PHP message: [owncloud][locking][1] acquire shared lock on "files/zeros_1gb.new.dd" on storage "object::user:popov.k"
app-nextcloud-int-6f8b6d568-v9lng - NOTICE: PHP message: [owncloud][locking][1] change lock on "files/zeros_1gb.new.dd" to exclusive on storage "object::user:popov.k"
app-nextcloud-int-6f8b6d568-vbdbt - NOTICE: PHP message: [owncloud][locking][1] acquire shared lock on "files/zeros_1gb.new.dd" on storage "object::user:popov.k"
app-nextcloud-int-6f8b6d568-vbdbt - NOTICE: PHP message: [owncloud][no app in context][1] {"Exception":"OCP\\Lock\\LockedException","Message":"\"files/9faf980fca1fefa8ed8a83420b80642b\"(\"object::user:popov.k::files/zeros_1gb.new.dd\") is locked, existing lock on file: exclusive","Code":0,"Trace":[{"file":"/var/www/html/lib/private/Files/Storage/Common.php","line":758,"function":"acquireLock","class":"OC\\Lock\\MemcacheLockingProvider","type":"->","args":["files/9faf980fca1fefa8ed8a83420b80642b",1,"object::user:popov.k::files/zeros_1gb.new.dd"]},{"file":"/var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php","line":610,"function":"acquireLock","class":"OC\\Files\\Storage\\Common","type":"->","args":["files/zeros_1gb.new.dd",1,{"__class__":"OC\\Lock\\MemcacheLockingProvider"}]},{"file":"/var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php","line":610,"function":"acquireLock","class":"OC\\Files\\Storage\\Wrapper\\Wrapper","type":"->","args":["files/zeros_1gb.new.dd",1,{"__class__":"OC\\Lock\\MemcacheLockingProvider"}]},{"file":"/var/www/html/lib/private/Files/View.php","line":1961,"function":"acquireLock","class":"OC\\Files\\Storage\\Wrapper\\Wrapper","type":"->","args":["files/zeros_1gb.new.dd",1,{"__class__":"OC\\Lock\\MemcacheLockingProvider"}]},{"file":"/var/www/html/lib/private/Files/View.php","line":2076,"function":"lockPath","class":"OC\\Files\\View","type":"->","args":["//zeros_1gb.new.dd",1,false]},{"file":"/var/www/html/apps/dav/lib/Connector/Sabre/Node.php","line":406,"function":"lockFile","class":"OC\\Files\\View","type":"->","args":["//zeros_1gb.new.dd",1]},{"file":"/var/www/html/apps/dav/lib/Connector/Sabre/Directory.php","line":161,"function":"acquireLock","class":"OCA\\DAV\\Connector\\Sabre\\Node","type":"->","args":[1]},{"file":"/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php","line":307,"function":"createFile","class":"OCA\\DAV\\Connector\\Sabre\\Directory","type":"->","args":["zeros_1gb.new.dd",null]},{"file":"/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php","line":133,"function":"copyNode","class":"Sabre\\DAV\\Tree","type":"->","args":[{"__class__":"OCA\\DAV\\Upload\\FutureFile"},{"__class__":"OCA\\DAV\\Files\\FilesHome"},"zeros_1gb.new.dd"]},{"file":"/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php","line":163,"function":"copy","class":"Sabre\\DAV\\Tree","type":"->","args":["uploads/popov.k/web-file-upload-ec4426fdee1b17f40277ca0514822f4b-1656342757986/.file","files/popov.k/zeros_1gb.new.dd"]},{"file":"/var/www/html/apps/dav/lib/Upload/ChunkingPlugin.php","line":94,"function":"move","class":"Sabre\\DAV\\Tree","type":"->","args":["uploads/popov.k/web-file-upload-ec4426fdee1b17f40277ca0514822f4b-1656342757986/.file","files/popov.k/zeros_1gb.new.dd"]},{"file":"/var/www/html/apps/dav/lib/Upload/ChunkingPlugin.php","line":76,"function":"performMove","class":"OCA\\DAV\\Upload\\ChunkingPlugin","type":"->","args":["uploads/popov.k/web-file-upload-ec4426fdee1b17f40277ca0514822f4b-1656342757986/.file","files/popov.k/zeros_1gb.new.dd"]},{"file":"/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php","line":89,"function":"beforeMove","class":"OCA\\DAV\\Upload\\ChunkingPlugin","type":"->","args":["uploads/popov.k/web-file-upload-ec4426fdee1b17f40277ca0514822f4b-1656342757986/.file","files/popov.k/zeros_1gb.new.dd"]},{"file":"/var/www/html/3rdparty/sabre/dav/lib/DAV/CorePlugin.php","line":603,"function":"emit","class":"Sabre\\DAV\\Server","type":"->","args":["beforeMove",["uploads/popov.k/web-file-upload-ec4426fdee1b17f40277ca0514822f4b-1656342757986/.file","files/popov.k/zeros_1gb.new.dd"]]},{"file":"/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php","line":89,"function":"httpMove","class":"Sabre\\DAV\\CorePlugin","type":"->","args":[{"__class__":"Sabre\\HTTP\\Request"},{"__class__":"Sabre\\HTTP\\Response"}]},{"file":"/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php","line":472,"function":"emit","class":"Sabre\\DAV\\Server","type":"->","args":["method:MOVE",[{"__class__":"Sabre\\HTTP\\Request"},{"__class__":"Sabre\\HTTP\\Response"}]]},{"file":"/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php","line":253,"function":"invokeMethod","class":"Sabre\\DAV\\Server","type":"->","args":[{"__class__":"Sabre\\HTTP\\Request"},{"__class__":"Sabre\\HTTP\\Response"}]},{"file":"/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php","line":321,"function":"start","class":"Sabre\\DAV\\Server","type":"->","args":[]},{"file":"/var/www/html/apps/dav/lib/Server.php","line":352,"function":"exec","class":"Sabre\\DAV\\Server","type":"->","args":[]},{"file":"/var/www/html/apps/dav/appinfo/v2/remote.php","line":35,"function":"exec","class":"OCA\\DAV\\Server","type":"->","args":[]},{"file":"/var/www/html/remote.php","line":166,"args":["/var/www/html/apps/dav/appinfo/v2/remote.php"],"function":"require_once"}],"File":"/var/www/html/lib/private/Lock/MemcacheLockingProvider.php","Line":80,"CustomMessage":"--"}
app-nextcloud-int-6f8b6d568-v9lng - NOTICE: PHP message: [owncloud][locking][1] change lock on "files/zeros_1gb.new.dd" to shared on storage "object::user:popov.k"
app-nextcloud-int-6f8b6d568-v9lng - NOTICE: PHP message: [owncloud][locking][1] release shared lock on "files/zeros_1gb.new.dd" on storage "object::user:popov.k"

From browser the process looks like the following:

  • file is being uploaded in chunks
  • after last chunk is uploaded, request "MOVE /remote.php" is done which completes with 423 (Locked) response. (I suspect this code making the request)
  • error message is displayed in NextCloud pop-up
  • file remains in NextCloud with .part suffix

When using desktop client to sync the same file server-side logs display the same sequence but the client (probably) does retry for this MOVE request which succeeds.

Steps to reproduce

  1. set up NectCloud like described above
  2. upload 1Gb+ file via browser

Expected behavior

Even large files uploaded successfully.

Solution (?): Locking retried on server side when executing the MOVE request.

Installation method

Official Docker image

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.0

Web server

Nginx

Database engine version

PostgreSQL

Is this bug present after an update or on a fresh install?

Fresh Nextcloud Server install

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

  • [ ] Default user-backend (database)
  • [X] LDAP/ Active Directory
  • [ ] SSO - SAML
  • [ ] Other

Configuration report

{
    "system": {
        "htaccess.RewriteBase": "\/",
        "memcache.local": "\\OC\\Memcache\\APCu",
        "apps_paths": [
            {
                "path": "\/var\/www\/html\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/www\/html\/custom_apps",
                "url": "\/custom_apps",
                "writable": true
            }
        ],
        "log_type": "errorlog",
        "loglevel": 1,
        "logfile": "",
        "log_type_audit": "errorlog",
        "maintenance_window_start": 100,
        "overwriteprotocol": "https",
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***",
        "objectstore": {
            "class": "\\OC\\Files\\ObjectStore\\S3",
            "arguments": {
                "bucket": "nextcloud-int-sbmt-infra",
                "autocreate": false,
                "key": "***REMOVED SENSITIVE VALUE***",
                "secret": "***REMOVED SENSITIVE VALUE***",
                "hostname": "storage.yandexcloud.net",
                "region": "ru-central1",
                "use_ssl": true,
                "use_path_style": true
            }
        },
        "mail_smtpmode": "smtp",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "465",
        "mail_smtpsecure": "ssl",
        "mail_smtpauth": false,
        "mail_smtpauthtype": "LOGIN",
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "localhost",
            "nextcloud.sbmt.io",
            "nextcloud"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "pgsql",
        "version": "24.0.1.1",
        "overwrite.cli.url": "https:\/\/localhost",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": "6379",
            "password": "***REMOVED SENSITIVE VALUE***"
        },
        "ldapProviderFactory": "OCA\\User_LDAP\\LDAPProviderFactory",
        "app_install_overwrite": [
            "files_external_gdrive",
            "drawio"
        ],
        "maintenance": false,
        "filelocking.debug": {
            "true": "",
            "1": "",
            "false": ""
        }
    }
}

List of activated Apps

Enabled:
  - activity: 2.16.0
  - bruteforcesettings: 2.4.0
  - circles: 24.0.0
  - cloud_federation_api: 1.7.0
  - comments: 1.14.0
  - contacts: 4.1.1
  - contactsinteraction: 1.5.0
  - dashboard: 7.4.0
  - dav: 1.22.0
  - drawio: 1.0.3
  - federatedfilesharing: 1.14.0
  - federation: 1.14.0
  - files: 1.19.0
  - files_external: 1.16.1
  - files_pdfviewer: 2.5.0
  - files_rightclick: 1.3.0
  - files_sharing: 1.16.2
  - files_trashbin: 1.14.0
  - files_versions: 1.17.0
  - files_videoplayer: 1.13.0
  - forms: 2.5.1
  - groupfolders: 12.0.1
  - integration_dropbox: 1.0.4
  - integration_google: 1.0.6
  - lookup_server_connector: 1.12.0
  - nextcloud_announcements: 1.13.0
  - notifications: 2.12.0
  - oauth2: 1.12.0
  - officeonline: 1.1.3
  - password_policy: 1.14.0
  - photos: 1.6.0
  - privacy: 1.8.0
  - provisioning_api: 1.14.0
  - serverinfo: 1.14.0
  - settings: 1.6.0
  - sharebymail: 1.14.0
  - systemtags: 1.14.0
  - text: 3.5.1
  - theming: 1.15.0
  - theming_customcss: 1.11.0
  - twofactor_backupcodes: 1.13.0
  - updatenotification: 1.14.0
  - user_ldap: 1.14.1
  - user_status: 1.4.0
  - viewer: 1.8.0
  - workflowengine: 2.6.0
Disabled:
  - accessibility: 1.8.0
  - admin_audit
  - encryption
  - firstrunwizard: 2.11.0
  - logreader: 2.7.0
  - recommendations: 1.2.0
  - support: 1.6.0
  - survey_client: 1.10.0
  - weather_status: 1.2.0

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

No response

k-popov avatar Jun 28 '22 07:06 k-popov

I think we have the same issue on multiple NC Versions(NC 22.2.3) the latest being NC 24.0.2.

Setup:

  • Nextcloud on K8s
  • Redis (6.2.7) as filelocking provider deployed via helm
  • Chunksize of 240Mb

It is possible to see, that the files are correctly stored on storage, before this happens.

From the nextcloud.log in debug mode and with 'filelocking.debug' => true. We see this at the end of the upload wit the 423 you mentioned in the Browser:

{ "reqId": "uDOo0WwTwNoFWaKL7lpH", "level": 1, "time": "2022-06-28T12:17:56+00:00", "remoteAddr": "111111.11111.11111.1111", "user": "[email protected]", "app": "no app in context", "method": "MOVE", "url": "/remote.php/dav/uploads/[email protected]/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "message": ""files/6c1a102c3da4d854a16bff1982a4b470"("home::[email protected]::files/[email protected]/files/3/5GB.file.upload.part") is locked, existing lock on file: exclusive", "userAgent": "Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0", "version": "24.0.2.1", "exception": { "Exception": "OCP\Lock\LockedException", "Message": ""files/6c1a102c3da4d854a16bff1982a4b470"("home::[email protected]::files/[email protected]/files/3/5GB.file.upload.part") is locked, existing lock on file: exclusive", "Code": 0, "Trace": [ { "file": "/var/www/html/lib/private/Files/Storage/Common.php", "line": 758, "function": "acquireLock", "class": "OC\Lock\MemcacheLockingProvider", "type": "->", "args": [ "files/6c1a102c3da4d854a16bff1982a4b470", 2, "home::[email protected]::files/[email protected]/files/3/5GB.file.upload.part" ] }, { "file": "/var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php", "line": 610, "function": "acquireLock", "class": "OC\Files\Storage\Common", "type": "->", "args": [ "files/[email protected]/files/3/5GB.file.upload.part", 2, { "class": "OC\Lock\MemcacheLockingProvider" } ] }, { "file": "/var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php", "line": 610, "function": "acquireLock", "class": "OC\Files\Storage\Wrapper\Wrapper", "type": "->", "args": [ "files/[email protected]/files/3/5GB.file.upload.part", 2, { "class": "OC\Lock\MemcacheLockingProvider" } ] }, { "file": "/var/www/html/lib/private/Files/View.php", "line": 1961, "function": "acquireLock", "class": "OC\Files\Storage\Wrapper\Wrapper", "type": "->", "args": [ "files/[email protected]/files/3/5GB.file.upload.part", 2, { "class": "OC\Lock\MemcacheLockingProvider" } ] }, { "file": "/var/www/html/lib/private/Files/View.php", "line": 2076, "function": "lockPath", "class": "OC\Files\View", "type": "->", "args": [ "/[email protected]/files/3/5GB.file.upload.part", 2, false ] }, { "file": "/var/www/html/apps/dav/lib/Connector/Sabre/Directory.php", "line": 162, "function": "lockFile", "class": "OC\Files\View", "type": "->", "args": [ "/[email protected]/files/3/5GB.file.upload.part", 2 ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php", "line": 307, "function": "createFile", "class": "OCA\DAV\Connector\Sabre\Directory", "type": "->", "args": [ "5GB.file", null ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php", "line": 133, "function": "copyNode", "class": "Sabre\DAV\Tree", "type": "->", "args": [ { "class": "OCA\DAV\Upload\FutureFile" }, { "class": "OCA\DAV\Connector\Sabre\Directory" }, "5GB.file" ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php", "line": 163, "function": "copy", "class": "Sabre\DAV\Tree", "type": "->", "args": [ "uploads/[email protected]/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/[email protected]/3/5GB.file" ] }, { "file": "/var/www/html/apps/dav/lib/Upload/ChunkingPlugin.php", "line": 94, "function": "move", "class": "Sabre\DAV\Tree", "type": "->", "args": [ "uploads/[email protected]/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/[email protected]/3/5GB.file" ] }, { "file": "/var/www/html/apps/dav/lib/Upload/ChunkingPlugin.php", "line": 76, "function": "performMove", "class": "OCA\DAV\Upload\ChunkingPlugin", "type": "->", "args": [ "uploads/[email protected]/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/[email protected]/3/5GB.file" ] }, { "file": "/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php", "line": 89, "function": "beforeMove", "class": "OCA\DAV\Upload\ChunkingPlugin", "type": "->", "args": [ "uploads/[email protected]/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/[email protected]/3/5GB.file" ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/CorePlugin.php", "line": 603, "function": "emit", "class": "Sabre\DAV\Server", "type": "->", "args": [ "beforeMove", [ "uploads/[email protected]/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/[email protected]/3/5GB.file" ] ] }, { "file": "/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php", "line": 89, "function": "httpMove", "class": "Sabre\DAV\CorePlugin", "type": "->", "args": [ { "class": "Sabre\HTTP\Request" }, { "class": "Sabre\HTTP\Response" } ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php", "line": 472, "function": "emit", "class": "Sabre\DAV\Server", "type": "->", "args": [ "method:MOVE", [ { "class": "Sabre\HTTP\Request" }, { "class": "Sabre\HTTP\Response" } ] ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php", "line": 253, "function": "invokeMethod", "class": "Sabre\DAV\Server", "type": "->", "args": [ { "class": "Sabre\HTTP\Request" }, { "class": "Sabre\HTTP\Response" } ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php", "line": 321, "function": "start", "class": "Sabre\DAV\Server", "type": "->", "args": [] }, { "file": "/var/www/html/apps/dav/lib/Server.php", "line": 352, "function": "exec", "class": "Sabre\DAV\Server", "type": "->", "args": [] }, { "file": "/var/www/html/apps/dav/appinfo/v2/remote.php", "line": 35, "function": "exec", "class": "OCA\DAV\Server", "type": "->", "args": [] }, { "file": "/var/www/html/remote.php", "line": 166, "args": [ "/var/www/html/apps/dav/appinfo/v2/remote.php" ], "function": "require_once" } ], "File": "/var/www/html/lib/private/Lock/MemcacheLockingProvider.php", "Line": 85, "CustomMessage": "--" } }

Other threads can be found with only one workaround, which is: Disable File locking or upping the Filelock ttl (did not work for us). Redis on Debug says nothing at all.

Some other Threads I've found: nextcloud/server#23615 nextcloud/server#9001

Update:

  'part_file_in_storage' => true,
  'filesystem_check_changes' => 1,
  'filelocking.ttl' => 43200,

didn't change anything...

Doriangaensslen avatar Jun 28 '22 12:06 Doriangaensslen

I've also noticed that desktop client (Linux version) and nextcloudcmd (nextcloud-desktop-cmd Ubuntu package name) are somehow able to sync these large files. Aren't these clients using WebDAV? Are they using it differently?

k-popov avatar Sep 01 '22 09:09 k-popov

Still exists on 24.0.6

Doriangaensslen avatar Nov 10 '22 12:11 Doriangaensslen

Turns out, that the MOVE Command send by the Browser fails every time exactly after 60 Seconds (seen by developer Tools in the Browser). image

Our config was wrong and did not use redis at all. Now that redis is used for the filelocking, i see only entrys after 60 seconds after the Upload. The problem still persists. When I refresh the browser window, the most cases the files shows right away.

I tried:

  • Deactivating antivirus
  • Scale to 1 Pod
  • Tested performance on NFS (we use that as storage backend)
  • using <memcache.local>\OC\Memcache\Redis</memcache.local> instead of apcu

... no luck so far

Doriangaensslen avatar Nov 11 '22 13:11 Doriangaensslen

The issue with the MOVE Command that fails after 60 Seconds has to be the Ingress (in our case nginx). We added the following annotations to the ingress and checked with the nginx pod in nginx.conf if all the options are applied.

    nginx.ingress.kubernetes.io/cors-allow-headers: X-Forwarded-For
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
    nginx.ingress.kubernetes.io/proxy-buffer-size: 225m
    nginx.ingress.kubernetes.io/proxy-buffering: "on"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 60s
    nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
    nginx.ingress.kubernetes.io/proxy-request-buffering: "on"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"

Which seem to improve the situation and notably the speed of the uploads. But with the caveat that we could not upload files bigger than 10Gbs so far with our config (so surely this config is still not optimal).

However, with the MOVE command from the browser, the Pod is busy and using all its resources for that and Apache is failing to serve the status.php, which resulted in our case in a killed Pod (failing Readiness Probes). As a workaround, we upped the timeout time in the probes. It's unclear why the MOVE takes so many resources, that apache/php construct on the nextcloud container can't handle even an easy curl on status.php. It seems like that one locks out all other processing of PHP code in this time.

Doriangaensslen avatar Nov 16 '22 10:11 Doriangaensslen

sudo redis-cli flushall

ondrejlohnisky avatar Jan 18 '23 19:01 ondrejlohnisky

Hi, please update to 24.0.9 or better 25.0.3 and report back if it fixes the issue. Thank you!

My goal is to add a label like e.g. 25-feedback to this ticket of an up-to-date major Nextcloud version where the bug could be reproduced. However this is not going to work without your help. So thanks for all your effort!

If you don't manage to reproduce the issue in time and the issue gets closed but you can reproduce the issue afterwards, feel free to create a new bug report with up-to-date information by following this link: https://github.com/nextcloud/server/issues/new?assignees=&labels=bug%2C0.+Needs+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+

szaimen avatar Jan 23 '23 14:01 szaimen

Hi, my installation was exactly as the above one, with NC 24.0.5, I was experimenting the problem and I stumbled upon this issue.

On 24.0.5 I tried to upload a 2.2GB file, and got the locking error mentioned above. The file was correctly uploaded though (checked: I downloaded it again and checked it matched the original). I upped the locking TTL to 12h (as suggested above), but still the locking error after uploading persists.

As suggested I upgraded my NC to 25.0.3 (Helm chart version 3.5.0), but the problem is still the same: upload a 2.2 GB file, lock error but file successfully uploaded.

jorgegv avatar Feb 22 '23 14:02 jorgegv

As suggested I upgraded my NC to 25.0.3 (Helm chart version 3.5.0), but the problem is still the same: upload a 2.2 GB file, lock error but file successfully uploaded.

Sounds like you are running into some kind of timeout. Can you make sure that all of the mentioned values in https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/big_file_upload_configuration.html are correctly applied?

szaimen avatar Feb 22 '23 14:02 szaimen

These are my current values:

php_value upload_max_filesize 5G
php_value post_max_size 5G
php_value max_input_time 3600
php_value max_execution_time 3600

And also:

'filelocking.enabled' => true,
'filelocking.ttl' => 43200

These values seem enough for a 2.2GB upload, right?

It does not seem a timeout to me... the upload finishes successfully (but shows the locking error), and the file appears in the web interface afterwards. Indeed, if I download the file, the content matches the original byte for byte.

jorgegv avatar Feb 22 '23 15:02 jorgegv

Did you also adjust web server or proxy timeouts?

szaimen avatar Feb 22 '23 15:02 szaimen

My proxy is a nginx Ingress in Kubernetes, but the timeouts that can be configured are for establishing connections, not for limiting the duration of those connections.

Again, the upload finishes successfully (HTTP wise) so there seems not to be any timeout or maximum data size problems in the data path...?

jorgegv avatar Feb 22 '23 15:02 jorgegv

What we discovered on k8s with almost unlimited timeoutsettings:

  • File uploads normaly and finishes also uploading
  • browser sends MOVE
  • on the running pods you see in the data directory that the chunks are stiched together
  • when the operation is finished, the browser receives the error and prints the message
  • press F5 and the file is there
  • Different deployment methods (eg.docker) have this problem too

Doriangaensslen avatar Feb 22 '23 16:02 Doriangaensslen

This is exactly my failure scenario.

jorgegv avatar Feb 22 '23 17:02 jorgegv

Some more tests on this front: I installed a temporary VM with:

  • Rocky Linux 8 latest (RHEL clone)
  • PHP 8.1.16 running in FPM mode (this is the same PHP as the one in the Docker image which is used for deployment in K8S)
  • Apache
  • External MariaDB server
  • Vanilla Nextcloud 25.0.3 minimally installed via Webinstaller, with no optimizations (i.e. no opcache, no memcache, stock DB-based locking)
  • (No docker, no kubernetes, just plain old "hosted" installation)

With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading.

After doing this, I modified the installation above:

  • Added and configured opcache
  • Added and configured local redis server
  • Configured Nextcloud memcache using Redis
  • Configured Nextcloud locking using Redis
  • Fixed all security and performance recommendations in the Admin dashboard (now it shows a green tick)

I retried the same 2.2GB upload, and again it worked flawlessly, with no locking errors shown after upload.

jorgegv avatar Feb 23 '23 10:02 jorgegv

And another new test: another temporary VM with:

  • Rocky Linux 8 latest
  • Docker
  • Nextcloud installed using AIO image

With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading.

So:

  • Webinstaller/ZIP install works OK
  • AIO install works OK

Clues so far point at some specific problem with the Helm/Kubernetes deployment.

One of my team mates suggested it may have something to do with Redis being deployed in cluster mode in Kubernetes? There are several Pods for it, while in a VM or AIO install only a single Redis instance is deployed... I'll test and report back.

jorgegv avatar Feb 24 '23 10:02 jorgegv

New test: deployed in our K8S staging cluster adding redis configuration architecture: standalone to the Helm chart config. I verified that now only one redis master Pod is deployed instead of the 4 previous ones (1 master and 3 slaves).

The error still happens with single node Redis deployment.

jorgegv avatar Mar 01 '23 11:03 jorgegv

  • Rocky Linux 8 latest (RHEL clone)
  • PHP 8.1.16 running in FPM mode (this is the same PHP as the one in the Docker image which is used for deployment in K8S)
  • Apache
  • External MariaDB server
  • Vanilla Nextcloud 25.0.3 minimally installed via Webinstaller, with no optimizations (i.e. no opcache, no memcache, stock DB-based locking)
  • (No docker, no kubernetes, just plain old "hosted" installation)

With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading.

After doing this, I modified the installation above:

  • Added and configured opcache
  • Added and configured local redis server
  • Configured Nextcloud memcache using Redis
  • Configured Nextcloud locking using Redis
  • Fixed all security and performance recommendations in the Admin dashboard (now it shows a green tick)

I retried the same 2.2GB upload, and again it worked flawlessly, with no locking errors shown after upload.

So basically it looks like the docker image or helm chart might have an issue, right?

szaimen avatar Mar 01 '23 11:03 szaimen

So basically it looks like the docker image or helm chart might have an issue, right?

Well, according to my tests (see previous comments), the Docker image is fine, it installs and runs with no errors on a regular instance (Rocky VM). Error seems to be specific to K8S deployment.

jorgegv avatar Mar 01 '23 11:03 jorgegv

Error seems to be specific to K8S deployment.

All right. Shall I move over to the helm chart repo then?

szaimen avatar Mar 01 '23 11:03 szaimen

All right. Shall I move over to the helm chart repo then?

That's my current line of investigation, yes.

BTW, @Doriangaensslen said that the problem happened also with the DOcker AIO image, but I did not reproduce it there.

jorgegv avatar Mar 01 '23 11:03 jorgegv

Ah, you tried the AIO image instead of the "normal" image? That could still lead to the docker image being the culprit here as AIO is using its own image. Moving to the docker repo then for now.

szaimen avatar Mar 01 '23 11:03 szaimen

I did not know there were 2 docker images, now I know, thanks :-)

jorgegv avatar Mar 01 '23 12:03 jorgegv

New test: deployed Nextcloud Helm chart with Redis disabled, and retried the 2.2GB upload.

An error about locking appears, though different from the previous one, and more terse: ""TEST20_3.bin" is locked"

image

As with the previous errors, if you refresh, the file has been correctly uploaded and appears in the web UI.

So it seems we can discard Redis as the source of the problem, it is somewhere else.

jorgegv avatar Mar 01 '23 13:03 jorgegv

It's also what we experienced. We also tried with the other cashing methods, no luck there.

Doriangaensslen avatar Mar 01 '23 13:03 Doriangaensslen

We noticed restarts, because the readiness probe failed when the file was stiched toghether, Because the pod was busy processing it. But still after resolving that, we still have this problem.

Doriangaensslen avatar Mar 01 '23 15:03 Doriangaensslen

i have the same issue "file locked: exclusive" using iONOS as host.

Y3Z0N avatar May 31 '23 02:05 Y3Z0N

I have intermittent issues on a "normal" (non-docker, non-k8s) setup, also using Redis. Didn't occur in 22.x, but since I installed 25 I get that every now and then, without meaningful ways to replicate it. It happens often when doing the auto-backup of photos via the iOS app - which is interesting as the files do not exist in the first place.

I am tempted to disable file-locking, as this is almost a single-user instance (well, the other user is a family member, and it's really only for auto-backups of photos). Still, it'd be better if this would help detecting any sort of issue here.

Happy to help, as always.

pjft avatar Jul 19 '23 13:07 pjft

I am experiencing essentially the same problem, running Nextcloud 27.0.1 directly on an Apache instance, without any containers.

I have kept the upload and post sizes at their low defaults of 50M, due to resource limitations on the server. Files are stored in S3.


Appearing in the log file...

OCP\Lock\LockedException: "<pathname>" is locked, existing lock on file: 3 shared locks

Similarly, shown as a message popup on the browser page...

"<pathname>" is locked, existing lock on file: 3 shared locks

brainchild0 avatar Sep 14 '23 11:09 brainchild0

Hi there, we've the same problem with 27.0.1-apache image. We're using a K8S Cluster hosted at IONOS. We use an nginx-ingress controller in front of the pod with the following annotations:

    nginx.ingress.kubernetes.io/cors-allow-headers: X-Forwarded-For
    nginx.ingress.kubernetes.io/enable-cors: 'true'
    nginx.ingress.kubernetes.io/proxy-body-size: 0m
    nginx.ingress.kubernetes.io/proxy-buffer-size: 225m
    nginx.ingress.kubernetes.io/proxy-buffering: 'on'
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 60s
    nginx.ingress.kubernetes.io/proxy-read-timeout: 900s
    nginx.ingress.kubernetes.io/proxy-request-buffering: 'on'
    nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'

mueller-tobias avatar Sep 18 '23 05:09 mueller-tobias