billiard icon indicating copy to clipboard operation
billiard copied to clipboard

Abnormal task termination

Open jmdacruz opened this issue 6 years ago • 9 comments

Using Celery 3.1.25 with billiard 3.3.0.23 (with Redis 4.0.2) and running a stress test that sends ~30000 tasks (each of them consuming CPU for 100ms with a simple multiplication operation and returning) to be executed asynchronously without waiting for results, I regularly see at least 2 or 3 tasks that fail with an exception in billiard:

Traceback (most recent call last):
  ...
  File "/application/virtualenv/lib/python2.7/site-packages/Stressy/Stressy.py", line 21, in execute
    value = self.stress(5, 0.1)
  File "/application/virtualenv/lib/python2.7/site-packages/Stressy/Stressy.py", line 36, in stress
    value = x*x
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/common.py", line 95, in _shutdown_cleanup
    sys.exit(-(256 - signum))
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/pool.py", line 286, in exit
    return _exit()
SystemExit

The error rate is still incredibly low (less than 0.01%), but I wonder if this could be avoided altogether.

jmdacruz avatar Oct 24 '17 18:10 jmdacruz

I can confirm that this issue is not present when using Celery 4.1.0 with billiard 3.5.0.3. What would be the latest billiard version that is compatible with Celery 3.1.25?

jmdacruz avatar Oct 24 '17 22:10 jmdacruz

you could first try latest billiard release first with celery 3.1.x if that doesn't work try prior versions thanks.

auvipy avatar Oct 26 '17 05:10 auvipy

https://github.com/celery/billiard/issues/214

auvipy avatar Oct 26 '17 05:10 auvipy

@auvipy I was actually able to reproduce it with Celery 4.1.0 and billiard 3.5.0.3. This is what I did:

  • A task that uses the jsonmerge library
  • Using logging with debug level (writing to a file), this library generates a lot of log entries
  • The result is that once in a while billiard will kill the process while it's writing the log entries (the ration is 2 of every 30000 tasks, aprox)
Traceback (most recent call last):
...
  File "/application/virtualenv/lib/python2.7/site-packages/jsonmerge/__init__.py", line 270, in merge
    return walk.descend(schema, base, head, meta).val
  File "/application/virtualenv/lib/python2.7/site-packages/jsonmerge/__init__.py", line 42, in descend
    log.debug("descend: %sschema %s" % (self._indent(), schema.ref,))
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1155, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1286, in _log
    self.handle(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1296, in handle
    self.callHandlers(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 1336, in callHandlers
    hdlr.handle(record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 759, in handle
    self.emit(record)
  File "/usr/local/lib/python2.7/logging/handlers.py", line 430, in emit
    logging.FileHandler.emit(self, record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 957, in emit
    StreamHandler.emit(self, record)
  File "/usr/local/lib/python2.7/logging/__init__.py", line 885, in emit
    self.flush()
  File "/usr/local/lib/python2.7/logging/__init__.py", line 845, in flush
    self.stream.flush()
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/common.py", line 125, in _shutdown_cleanup
    sys.exit(-(256 - signum))
  File "/application/virtualenv/lib/python2.7/site-packages/billiard/pool.py", line 280, in exit
    return _exit()
SystemExit

jmdacruz avatar Oct 26 '17 18:10 jmdacruz

what does master branch reproduce?

auvipy avatar Oct 28 '17 15:10 auvipy

Should I try just with billiard’s master? Or both billiard’s and celery’s?

jmdacruz avatar Oct 28 '17 16:10 jmdacruz

Please try with master versions of celery and billiard.

thedrow avatar Nov 04 '17 09:11 thedrow

I seem to have the same 'error' when doing a warm shutdown of celery (under supervisor, setting stopasgroup=true) while running a gdal process.

[2018-05-17 13:15:41,836: ERROR/ForkPoolWorker-2] tile.tasks.handle_weather_images[9609a901-29f5-4ddf-ba71-a09bde4319d0]: <built-in function Open> returned a result with an error set
Traceback (most recent call last):
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/osgeo/gdal.py", line 1674, in <lambda>
    __setattr__ = lambda self, name, value: _swig_setattr(self, Dataset, name, value)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/billiard/common.py", line 125, in _shutdown_cleanup
    sys.exit(-(256 - signum))
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/billiard/pool.py", line 280, in exit
    return _exit()
SystemExit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/weather_utils.py", line 165, in make_tiles
    if gdal_retile.main(cmd.split()):
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 881, in main
    build_pyramid(minfo, ds_created_tile_index, TileWidth, TileHeight)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 589, in build_pyramid
    input_ds = build_pyramid_level(level_mosaic_info, level_output_tile_info, level)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 612, in build_pyramid_level
    create_pyramid_tile(level_mosaic_info, offset_x, offset_y, width, height, tilename, OGRDS)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 404, in create_pyramid_tile
    dec.ulx + width * dec.scaleX, dec.uly)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 225, in get_data_set
    source_ds = self.cache.get(feature_name)
  File "/opt/virtualenvs/tile-server/lib/python3.5/site-packages/tile/gdal_retile.py", line 90, in get
    result = gdal.Open(name)
SystemError: <built-in function Open> returned a result with an error set

I use celery 4.1.0 and billiard 3.5.0.3. I would like to let the process continue (I put a long stopwaittime) but it seems it gets kills right away. I'm not sure if I can 'protect' the gdal code (it's the gdal_retile.py), and how... Any idea/suggestion?

sposs avatar May 17 '18 16:05 sposs

I am also facing the same issue when using the Redis with celery. @jmdacruz did you find any solution for this?

tirupathiraop avatar Dec 10 '19 12:12 tirupathiraop