weblate
weblate copied to clipboard
Timeout errors due to file lock, despite the component not being locked
Describe the bug
I have a Weblate instance linked to a git repository. We have uploaded several components translations (all monolingual GETTEXT format) thus far and have started noticing this issue near the end of the process.
When trying to upload a file via the REST API, we sometimes get a 500 error which is caused by a file lock timeout.
Timeout: The file lock '/app/data/vcs/application/component.lock' could not be acquired.
File "weblate/api/views.py", line 1076, in file
data["fuzzy"],
File "contextlib.py", line 74, in inner
return func(*args, **kwds)
File "weblate/trans/models/translation.py", line 959, in merge_upload
return self.handle_replace(request, fileobj)
File "weblate/trans/models/translation.py", line 900, in handle_replace
with self.component.repository.lock:
File "filelock.py", line 323, in __enter__
self.acquire()
File "filelock.py", line 278, in acquire
raise Timeout(self._lock_file)
The component does not seemed to be locked however (at least according to the UI or the API).
When navigating to the project in UI and manage -> repository maintenance, I am greeted by an error:
Attempts at Resolution
- deleting the lock file manually
- deleting and recreating the component
- removing and re-associating the translation before attempting to re-upload the po file
To Reproduce
This has happened randomly a few times so I have no known way to reproduce the error.
Expected behavior
The monolingual PO file uploads successfully
Server configuration and status
* Weblate: 4.3-dev
* Django: 3.1.1
* siphashc: 2.1
* Whoosh: 2.7.4
* translate-toolkit: 3.1.1
* lxml: 4.5.2
* Pillow: 7.2.0
* bleach: 3.2.1
* python-dateutil: 2.8.1
* social-auth-core: 3.3.3
* social-auth-app-django: 4.0.0
* django-crispy-forms: 1.9.2
* oauthlib: 3.1.0
* django-compressor: 2.4
* djangorestframework: 3.11.1
* django-filter: 2.3.0
* django-appconf: 1.0.4
* user-agents: 2.2.0
* filelock: 3.0.12
* setuptools: 40.8.0
* jellyfish: 0.8.2
* openpyxl: 3.0.5
* celery: 4.4.7
* kombu: 4.6.11
* translation-finder: 2.2
* html2text: 2020.1.16
* pycairo: 1.16.2
* pygobject: 3.30.4
* diff-match-patch: 20200713
* requests: 2.24.0
* django-redis: 4.12.1
* hiredis: 1.1.0
* sentry_sdk: 0.17.8
* Cython: 0.29.21
* misaka: 2.1.1
* GitPython: 3.1.8
* borgbackup: 1.1.13
* pyparsing: 2.4.7
* Python: 3.7.3
* Git: 2.20.1
* psycopg2: 2.8.6
* psycopg2-binary: 2.8.6
* phply: 1.2.5
* chardet: 3.0.4
* ruamel.yaml: 0.16.12
* tesserocr: 2.5.1
* akismet: 1.1
* boto3: 1.15.4
* zeep: 3.4.0
* aeidon: 1.7.0
* iniparse: 0.5
* mysqlclient: 2.0.1
* Mercurial: 5.5.1
* git-svn: 2.20.1
* git-review: 1.28.0
* Redis server: 3.0.6
* PostgreSQL server: 10.12
* Database backends: django.db.backends.postgresql
* Cache backends: default:RedisCache, avatar:FileBasedCache
* Email setup: django.core.mail.backends.smtp.EmailBackend: email1.ldc.yougov.local
* OS encoding: filesystem=utf-8, default=utf-8
* Celery:
* Platform: Linux 4.15.0-112-generic (x86_64)
Weblate deploy checks
WARNINGS:
?: (security.W019) You have 'django.middleware.clickjacking.XFrameOptionsMiddleware' in your MIDDLEWARE, but X_FRAME_OPTIONS is not set to 'DENY'. Unless there is a good reason for your site to serve other parts of itself in a frame, you should change it to 'DENY'.
INFOS:
?: (weblate.I028) Backups are not configured, it is highly recommended for production use
HINT: https://docs.weblate.org/en/latest/admin/backup.html
We are seeing this error sometimes even when the file successfully uploads (it appears in the component ui with the phrases translated). Other times this error occurs when the file does not upload at all.
This makes me think there is some sort of race condition causing this error?
There is internal lock to make sure that underlying Git repository is not access accessed concurrently. The exception can happen in case there is another process working on it.
The both API and views should report this in a nicer way, but still these errors can and will happen.
I am seeing it as well when doing a sync
gunicorn stderr | File "/usr/local/lib/python3.11/site-packages/weblate/trans/models/component.py", line 1948, in commit_pending
gunicorn stderr | with self.repository.lock:
gunicorn stderr | File "/usr/local/lib/python3.11/site-packages/weblate/utils/lock.py", line 86, in __enter__
gunicorn stderr | self._enter_implementation()
gunicorn stderr | File "/usr/local/lib/python3.11/site-packages/weblate/utils/lock.py", line 70, in _enter_redis gunicorn stderr | raise WeblateLockTimeoutError(
gunicorn stderr | weblate.utils.lock.WeblateLockTimeoutError: Lock on lock:repo:147 could not be acquired in 120s
120s is hardcoded in the code (weblate/vcs/apps.py)
88 lockfile = WeblateLock(
89 home, "gitlock", 0, "", "lock:{scope}", "{scope}", timeout=120
90 )
Maybe should it be configurable?