billiard icon indicating copy to clipboard operation
billiard copied to clipboard

Regression in 3.6.1 - Permission Denied

Open AvnerCohen opened this issue 6 years ago • 29 comments
trafficstars

On CentOs, we clearly see a regression as a result of the recent billiard update:

...
...
  File "/base_folder/venv3.7.2/lib/python3.7/site-packages/billiard/sharedctypes.py", line 37, in _new_value
    wrapper = heap.BufferWrapper(size)
  File "/base_folder/venv3.7.2/lib/python3.7/site-packages/billiard/heap.py", line 274, in __init__
    block = BufferWrapper._heap.malloc(size)
  File "/base_folder/venv3.7.2/lib/python3.7/site-packages/billiard/heap.py", line 255, in malloc
    (arena, start, stop) = self._malloc(size)
  File "/base_folder/venv3.7.2/lib/python3.7/site-packages/billiard/heap.py", line 152, in _malloc
    arena = Arena(length)
  File "/base_folder/venv3.7.2/lib/python3.7/site-packages/billiard/heap.py", line 79, in __init__
    dir=util.get_temp_dir(),
  File "/base_folder/.asdf/installs/python/3.7.2/lib/python3.7/tempfile.py", line 340, in mkstemp
    return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/base_folder/.asdf/installs/python/3.7.2/lib/python3.7/tempfile.py", line 258, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
PermissionError: [Errno 13] Permission denied: '/tmp/pymp-j0t8dkfk/pym-4118-x1vo2w4v'

Rolling back and forcing pip install billiard==3.6.0.0 solves the issue.

I went over the changes, between the versions, but couldn't really find the exact culprit.

AvnerCohen avatar Aug 19 '19 08:08 AvnerCohen

https://github.com/celery/billiard/compare/v3.6.0...master this are the changes.

auvipy avatar Aug 19 '19 10:08 auvipy

You'll have to provide us with more details. Can you reproduce this locally? Does the problem reproduce on all servers? Did you check user permissions? Could it be that /tmp is not accessible to your user?

thedrow avatar Aug 19 '19 11:08 thedrow

Sure. will provide everything needed, was hoping this would point to a specific change and ring a bell. I'll try to create a proper repro example.

AvnerCohen avatar Aug 19 '19 11:08 AvnerCohen

Hi,

I got a similar issue when upgrading from Celery 4.2.2 / Billiard 3.5.0 to Celery 4.4.2 / Billiard 3.6.3.0 on a project and try to execute a debugging task from django-admin shell:

from paperboy.celery import send_debug_mail                                                                                                                                                
send_debug_mail.delay()                                                                                                                                                                    <AsyncResult: 19a2eb4d-7044-41bf-b1d4-7236c905d770>
tail -n 100 var/log/celery/w1.log
[2020-03-25 10:55:37,753: CRITICAL/MainProcess] Unrecoverable error: PermissionError(13, 'Permission denied')
Traceback (most recent call last):
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/celery/worker/worker.py", line 205, in start
    self.blueprint.start(self)
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start
    return self.obj.start()
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/celery/concurrency/base.py", line 131, in start
    self.on_start()
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/celery/concurrency/prefork.py", line 112, in on_start
    **self.options)
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/celery/concurrency/asynpool.py", line 428, in __init__
    super(AsynPool, self).__init__(processes, *args, **kwargs)
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/billiard/pool.py", line 1046, in __init__
    self._create_worker_process(i)
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/celery/concurrency/asynpool.py", line 445, in _create_worker_process
    return super(AsynPool, self)._create_worker_process(i)
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/billiard/pool.py", line 1142, in _create_worker_process
    on_ready_counter = self._ctx.Value('i')
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/billiard/context.py", line 182, in Value
    ctx=self.get_context())
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/billiard/sharedctypes.py", line 83, in Value
    lock = ctx.RLock()
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/billiard/context.py", line 110, in RLock
    return RLock(ctx=self.get_context())
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/billiard/synchronize.py", line 207, in __init__
    SemLock.__init__(self, RECURSIVE_MUTEX, 1, 1, ctx=ctx)
  File "/home/foo/venv_foo_py3/lib/python3.6/site-packages/billiard/synchronize.py", line 72, in __init__
    kind, value, maxvalue, self._make_name(), unlink_now,
PermissionError: [Errno 13] Permission denied

To get It works I must downgrade billiard to 3.6.0 and it fails starting with 3.6.1.

Looks like the cause is that new line in pool._create_worker_process().

I have upgraded a similar project yesterday (same arch, confs, deps, versions etc) without any trouble so maybe this is a problem of resources on the OS.

Any hint on where to look for the file/directory causing the Permission denied exception?

Thanks

Starou avatar Mar 25 '20 10:03 Starou

I suggest using strace to see which files are being opened.

thedrow avatar Mar 26 '20 08:03 thedrow

Thanks @thedrow.

Here the interesting bits in the trace when I run strace over the celery command (instead of the daemon):

stat("/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/synchronize.py", {st_mode=S_IFREG|0644, st_size=13041, ...}) = 0
open("/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/__pycache__/synchronize.cpython-36.pyc", O_RDONLY|O_CLOEXEC) = 19
fstat(19, {st_mode=S_IFREG|0644, st_size=12086, ...}) = 0
lseek(19, 0, SEEK_CUR)                  = 0
fstat(19, {st_mode=S_IFREG|0644, st_size=12086, ...}) = 0
read(19, "3\r\r\n\275d|^\3612\0\0\343\0\0\0\0\0\0\0\0\0\0\0\0\22\0\0\0@\0\0"..., 12087) = 12086
read(19, "", 1)                         = 0
close(19)                               = 0
stat("/home/paperboy", {st_mode=S_IFDIR|0750, st_size=4096, ...}) = 0
stat("/home/paperboy/venv_paperboy_py3/bin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/paperboy/projects", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/usr/local/lib/python3.6", {st_mode=S_IFDIR|S_ISGID|0755, st_size=4096, ...}) = 0
stat("/usr/local/lib/python3.6/lib-dynload", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0
read(3, "<\235)[\310\3655\264z\r\370\322\307\37O\342\353\215\251\224\327\203d\314\245\222\250\213Z\275h\206"..., 2496) = 2496
fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0
read(3, "\311.l\301\265'\300\314\270\316\227\10\352d\210\206\354\274G\357?\342S\21u\242\330\244\273\304\255\247"..., 2496) = 2496
statfs("/dev/shm", {f_type=0x1021994, f_bsize=4096, f_blocks=51530, f_bfree=51400, f_bavail=51400, f_files=257646, f_ffree=257355, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
futex(0x7f40f4f182dc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
gettimeofday({1585210644, 332329}, NULL) = 0
lstat("/dev/shm/sem.p8QdiC", 0x7fff0c60b600) = -1 ENOENT (No such file or directory)
open("/dev/shm/sem.p8QdiC", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)

And the Celery output:

~/venv_paperboy/bin/celery  worker -A paperboy -l INFO
/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/backends/amqp.py:67: CPendingDeprecationWarning: 
    The AMQP result backend is scheduled for deprecation in     version 4.0 and removal in version v5.0.     Please use RPC backend or a persistent backend.

  alternative='Please use RPC backend or a persistent backend.')
 
 -------------- celery@rpaperboy02 v4.4.2 (cliffs)
--- ***** ----- 
-- ******* ---- Linux-3.2.0-4-amd64-x86_64-with-debian-7.11 2020-03-26 09:21:56
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         paperboy:0x7f80542724e0
- ** ---------- .> transport:   amqp://guest:**@localhost:5672//
- ** ---------- .> results:     amqp://
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . paperboy.celery.debug_task
  . paperboy.celery.send_debug_mail
  . paperboy.composition.tasks.compose_dossiers
  . paperboy.composition.tasks.compose_pages_chemin_de_fer
  . paperboy.composition.tasks.compose_pre_bat_dossier
  . paperboy.composition.tasks.compose_pre_bat_page

[2020-03-26 09:21:56,335: CRITICAL/MainProcess] Unrecoverable error: PermissionError(13, 'Permission denied')
Traceback (most recent call last):
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/worker/worker.py", line 205, in start
    self.blueprint.start(self)
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start
    return self.obj.start()
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/concurrency/base.py", line 132, in start
    self.on_start()
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/concurrency/prefork.py", line 117, in on_start
    **self.options)
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/concurrency/asynpool.py", line 486, in __init__
    super(AsynPool, self).__init__(processes, *args, **kwargs)
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/pool.py", line 1046, in __init__
    self._create_worker_process(i)
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/celery/concurrency/asynpool.py", line 503, in _create_worker_process
    return super(AsynPool, self)._create_worker_process(i)
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/pool.py", line 1142, in _create_worker_process
    on_ready_counter = self._ctx.Value('i')
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/context.py", line 182, in Value
    ctx=self.get_context())
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/sharedctypes.py", line 83, in Value
    lock = ctx.RLock()
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/context.py", line 110, in RLock
    return RLock(ctx=self.get_context())
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/synchronize.py", line 207, in __init__
    SemLock.__init__(self, RECURSIVE_MUTEX, 1, 1, ctx=ctx)
  File "/home/paperboy/venv_paperboy_py3/lib/python3.6/site-packages/billiard/synchronize.py", line 72, in __init__
    kind, value, maxvalue, self._make_name(), unlink_now,
PermissionError: [Errno 13] Permission denied

Starou avatar Mar 26 '20 08:03 Starou

That's strange. 600 means the owner can read and write to that file. Try to chmod /dev/shm/ to 777 and rerun the test case.

There's an old redhat bug that sets it to 755.

In any case, we should catch that error and provide a fallback.

thedrow avatar Mar 26 '20 10:03 thedrow

Indeed, on that Debian Wheezy instance, for some reason, /dev/shm/ is mounted with the 755 flags instead of 777.

Starou avatar Mar 26 '20 13:03 Starou

What kind of fallback can we provide? Maybe we should catch that error and notify the user that it is possible that their /dev/shm permissions is incorrect?

thedrow avatar Mar 26 '20 14:03 thedrow

@thedrow Any idea maybe what is the source of this regression? Can this be avoided all together?

AvnerCohen avatar Mar 26 '20 14:03 AvnerCohen

What kind of fallback can we provide? Maybe we should catch that error and notify the user that it is possible that their /dev/shm permissions is incorrect?

Maybe in giving the actual path that raise the exception because it may vary from an OS to another ?

Starou avatar Mar 26 '20 14:03 Starou

@Starou Any idea how can we do that?

thedrow avatar Apr 05 '20 07:04 thedrow

@thedrow: no, but I can look into It.

Starou avatar Apr 05 '20 15:04 Starou

I'm on xenial and see the same issue

aliceminotto avatar Apr 21 '20 09:04 aliceminotto

I'm on xenial and see the same issue

Did you check the permissions for /dev/shm?

thedrow avatar Apr 21 '20 13:04 thedrow

I'm on xenial and see the same issue

Did you check the permissions for /dev/shm?

They are drwxrwxrwt 2 root root 280 Apr 21 15:00 . which seems correct to me

aliceminotto avatar Apr 21 '20 14:04 aliceminotto

@AvnerCohen Did you check on your machines?

thedrow avatar May 03 '20 09:05 thedrow

@thedrow First thing firsts, the issue around /dev/shm is not related here, you can see that in my original issue gthe permission denied is on - /tmp/pymp

I created a full repro in a docker, so should be pretty easy to move on now: https://github.com/AvnerCohen/repro-billiard-288

Once built, you can uncomment the downgrade of billiard and you'll see that this is fixed: https://github.com/AvnerCohen/repro-billiard-288/blob/master/requirements.txt#L3

I still have few tests to do and I think I can further narrow down the issue now, but wanted to share that maybe others are also impacted by it.

AvnerCohen avatar May 04 '20 16:05 AvnerCohen

Have same issue on centos 7, As mentioned in first post, downgrading to billiard==3.6.0.0 solved the problem

cysio528 avatar May 23 '20 16:05 cysio528

I don't have an idea why you get a permission error. This happens outside of Docker too right?

thedrow avatar Jun 05 '20 04:06 thedrow

Correct, original issue is outside of docker. It is related to the specific umask assigned to the forked process: https://github.com/AvnerCohen/repro-billiard-288/blob/master/runner.py#L4

But I can't seem to figure out why this happens only on 3.6.1 and not on earlier versions.

AvnerCohen avatar Jun 05 '20 06:06 AvnerCohen

So maybe we can fix this by setting a different umask after forking?

thedrow avatar Jun 08 '20 07:06 thedrow

We have a similar problem with dev-python/billiard-3.6.3.0. The only difference is that we use gentoo linux and we need /run to be world writable:

[pid 33685] openat(AT_FDCWD, "/run/2IU0th", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)

Downgrade to 3.6.0 fixed this problem as well.

pva avatar Mar 19 '21 17:03 pva

I've made git bisect and found: 241c0d4782ba6c24bd5f2777ed5d1e084d39501c is the first bad commit

I've tried to revert this commit and yes, celery works with the current master without this commit.

pva avatar Mar 19 '21 17:03 pva

Why would that commit cause a permission issue? Also, are you sure you can't reproduce this bug? Have you tried multiple times?

thedrow avatar Mar 21 '21 18:03 thedrow

Yes, bug is perfectly reproducible and I have no idea how that commit breaks anything. My only guess is that it is somehow related to locking and that is why it causes regression. Anyway, as I said, revert back worked for me.

One important thing is that once I've mounted /dev/shm as tmpfs everything started to work again. So it's not only a permissions problem. /dev/shm should be tmpfs mounted.

pva avatar Apr 01 '21 09:04 pva

I am not sure how relevant is it to other people and cases, in the repro I created (https://github.com/AvnerCohen/repro-billiard-288) the issue and the fix is:

import os
pid = os.fork()os.umask(0o177) ## Fails
#os.umask(0o477) ## works
os.system('celery -A worker worker --loglevel=info')

E.g. mask change to allow write/execute makes this work for us.

AvnerCohen avatar Jun 15 '21 09:06 AvnerCohen

BTW - looking at the code changes in the faulty commit (https://github.com/celery/billiard/commit/241c0d4782ba6c24bd5f2777ed5d1e084d39501c) I am guessing the change is in Line 828

with on_ready_counter.get_lock():

Haven't drilled down, but I am guessing this is where a write/execute permission is required.

AvnerCohen avatar Jun 17 '21 05:06 AvnerCohen

what alternative should we use to get rid of that

auvipy avatar Nov 10 '21 16:11 auvipy