weblate icon indicating copy to clipboard operation
weblate copied to clipboard

Timeout errors due to file lock, despite the component not being locked

Open superDross opened this issue 4 years ago • 3 comments

Describe the bug

I have a Weblate instance linked to a git repository. We have uploaded several components translations (all monolingual GETTEXT format) thus far and have started noticing this issue near the end of the process.

When trying to upload a file via the REST API, we sometimes get a 500 error which is caused by a file lock timeout.

Timeout: The file lock '/app/data/vcs/application/component.lock' could not be acquired.
  File "weblate/api/views.py", line 1076, in file
  File "contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "weblate/trans/models/translation.py", line 959, in merge_upload
    return self.handle_replace(request, fileobj)
  File "weblate/trans/models/translation.py", line 900, in handle_replace
    with self.component.repository.lock:
  File "filelock.py", line 323, in __enter__
  File "filelock.py", line 278, in acquire
    raise Timeout(self._lock_file)

The component does not seemed to be locked however (at least according to the UI or the API).

When navigating to the project in UI and manage -> repository maintenance, I am greeted by an error: error

Attempts at Resolution

  • deleting the lock file manually
  • deleting and recreating the component
  • removing and re-associating the translation before attempting to re-upload the po file

To Reproduce

This has happened randomly a few times so I have no known way to reproduce the error.

Expected behavior

The monolingual PO file uploads successfully

Server configuration and status

* Weblate: 4.3-dev
 * Django: 3.1.1
 * siphashc: 2.1
 * Whoosh: 2.7.4
 * translate-toolkit: 3.1.1
 * lxml: 4.5.2
 * Pillow: 7.2.0
 * bleach: 3.2.1
 * python-dateutil: 2.8.1
 * social-auth-core: 3.3.3
 * social-auth-app-django: 4.0.0
 * django-crispy-forms: 1.9.2
 * oauthlib: 3.1.0
 * django-compressor: 2.4
 * djangorestframework: 3.11.1
 * django-filter: 2.3.0
 * django-appconf: 1.0.4
 * user-agents: 2.2.0
 * filelock: 3.0.12
 * setuptools: 40.8.0
 * jellyfish: 0.8.2
 * openpyxl: 3.0.5
 * celery: 4.4.7
 * kombu: 4.6.11
 * translation-finder: 2.2
 * html2text: 2020.1.16
 * pycairo: 1.16.2
 * pygobject: 3.30.4
 * diff-match-patch: 20200713
 * requests: 2.24.0
 * django-redis: 4.12.1
 * hiredis: 1.1.0
 * sentry_sdk: 0.17.8
 * Cython: 0.29.21
 * misaka: 2.1.1
 * GitPython: 3.1.8
 * borgbackup: 1.1.13
 * pyparsing: 2.4.7
 * Python: 3.7.3
 * Git: 2.20.1
 * psycopg2: 2.8.6
 * psycopg2-binary: 2.8.6
 * phply: 1.2.5
 * chardet: 3.0.4
 * ruamel.yaml: 0.16.12
 * tesserocr: 2.5.1
 * akismet: 1.1
 * boto3: 1.15.4
 * zeep: 3.4.0
 * aeidon: 1.7.0
 * iniparse: 0.5
 * mysqlclient: 2.0.1
 * Mercurial: 5.5.1
 * git-svn: 2.20.1
 * git-review: 1.28.0
 * Redis server: 3.0.6
 * PostgreSQL server: 10.12
 * Database backends: django.db.backends.postgresql
 * Cache backends: default:RedisCache, avatar:FileBasedCache
 * Email setup: django.core.mail.backends.smtp.EmailBackend: email1.ldc.yougov.local
 * OS encoding: filesystem=utf-8, default=utf-8
 * Celery: 
 * Platform: Linux 4.15.0-112-generic (x86_64)

Weblate deploy checks

?: (security.W019) You have 'django.middleware.clickjacking.XFrameOptionsMiddleware' in your MIDDLEWARE, but X_FRAME_OPTIONS is not set to 'DENY'. Unless there is a good reason for your site to serve other parts of itself in a frame, you should change it to 'DENY'.

?: (weblate.I028) Backups are not configured, it is highly recommended for production use
	HINT: https://docs.weblate.org/en/latest/admin/backup.html

superDross avatar Oct 08 '20 11:10 superDross

We are seeing this error sometimes even when the file successfully uploads (it appears in the component ui with the phrases translated). Other times this error occurs when the file does not upload at all.

This makes me think there is some sort of race condition causing this error?

superDross avatar Oct 09 '20 09:10 superDross

There is internal lock to make sure that underlying Git repository is not access accessed concurrently. The exception can happen in case there is another process working on it.

The both API and views should report this in a nicer way, but still these errors can and will happen.

nijel avatar Oct 12 '20 13:10 nijel

I am seeing it as well when doing a sync

gunicorn stderr |   File "/usr/local/lib/python3.11/site-packages/weblate/trans/models/component.py", line 1948, in commit_pending                                                                                   
gunicorn stderr |     with self.repository.lock:                                                                                                                                                                     
gunicorn stderr |   File "/usr/local/lib/python3.11/site-packages/weblate/utils/lock.py", line 86, in __enter__                                                                                                      
gunicorn stderr |     self._enter_implementation()                                                                                                                                                                   
gunicorn stderr |   File "/usr/local/lib/python3.11/site-packages/weblate/utils/lock.py", line 70, in _enter_redis                                                                                                   gunicorn stderr |     raise WeblateLockTimeoutError(                                                                                                                                                                 
gunicorn stderr | weblate.utils.lock.WeblateLockTimeoutError: Lock on lock:repo:147 could not be acquired in 120s  

120s is hardcoded in the code (weblate/vcs/apps.py)

 88         lockfile = WeblateLock(
 89             home, "gitlock", 0, "", "lock:{scope}", "{scope}", timeout=120
 90         )

Maybe should it be configurable?

joubu avatar Nov 22 '23 08:11 joubu