anonlink-entity-service icon indicating copy to clipboard operation
anonlink-entity-service copied to clipboard

Invalid signal error seen when running large data set

Open NACHC-CAD opened this issue 2 years ago • 0 comments

I'm running anon-link-entity-service with 6 hospitals each contributing 1,000,000 patients. During the run, there is a long section that uses about 60% of the available cpu, followed by a long (hours) period of time when only about 10% of the available CPU is being used, followed by the error shown in the attached logs.

Below is the area of the log where the error occurs. The attached files have more of the logs. I have the full logs but they are very large (about 1g).

full-log-error-section.txt run-log.txt run-log-error-focus.txt

pprl-error

backend_1 | [debug ] Connecting to redis [entityservice.cache.connection] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 port=26379 request=64fe99e6 rid=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 server=redis backend_1 | [info ] LOG_FILE: Connecting to redis [entityservice.cache. connection] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 request=64fe99e6 rid=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 backend_1 | [debug ] total comparisons: 10000000000000 [entityservice.views.run.status] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 request=64fe99e6 rid=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 nginx_1 | [200] - 172.18.0.1 - "GET /api/v1/projects/8efc27085d66234af616468b4251613028f05fa792c02df9/runs/eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1/status HTTP/1.1" 335 920 374 0.014 "-" "python-requests/2.26.0" "-" worker_a13_1 | [2021-12-30 20:44:28,832: DEBUG/ForkPoolWorker-2] [debug ] setting up tracing on task [entityservice.tasks] task_name=aggregate_comparisons worker_a13_1 | [2021-12-30 20:44:28,853: DEBUG/ForkPoolWorker-2] [debug ] Aggregating result chunks from 33060 files, total size: 958067760 [entityservice.tasks] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 run_id=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 task_name=aggregate_comparisons worker_a13_1 | [2021-12-30 20:44:28,949: WARNING/ForkPoolWorker-2] [warning ] Task 33ed3bfa-0953-429b-a530-c2818051fc31 is retrying after a 'S3Error' exception [entityservice.tasks] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 run_id=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 task_name=aggregate_comparisons worker_a13_1 | [2021-12-30 20:44:28,952: WARNING/ForkPoolWorker-2] /usr/lib/python3.9/signal.py:60: RuntimeWarning: invalid signal number 32, please use valid_signals() worker_a13_1 | sigs_set = _signal.pthread_sigmask(how, mask) worker_a13_1 | worker_a13_1 | [2021-12-30 20:44:28,952: WARNING/ForkPoolWorker-2] /usr/lib/python3.9/signal.py:60: RuntimeWarning: invalid signal number 33, please use valid_signals() worker_a13_1 | sigs_set = _signal.pthread_sigmask(how, mask) worker_a13_1 | worker_a13_1 | [2021-12-30 20:44:28,952: WARNING/ForkPoolWorker-2] /usr/lib/python3.9/signal.py:60: RuntimeWarning: invalid signal number 34, please use valid_signals() worker_a13_1 | sigs_set = _signal.pthread_sigmask(how, mask) worker_a13_1 | worker_a13_1 | [2021-12-30 20:44:28,995: INFO/MainProcess] [info ] An error occurred while processing task [entityservice.tasks] run_id=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 task_id=<Context: {'lang': 'py', 'task': 'entityservice.tasks.comparing.aggregate_comparisons', 'id': '33ed3bfa-0953-429b-a530-c2818051fc31', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'a95304d4-eedc-4712-b479-315c1a3b3714', 'parent_id': '2434d2de-235b-45a8-8bc0-488c92b5438a', 'argsrepr': "([[18129, 435100, 'similarity-scores/771dc665dafe5bf0cbd98d2e.bin'], [871, 20908, 'similarity-scores/31cae8ef35597b136be6e185.bin'], [811, 19468, 'similarity-scores/fa76d3d05a26b3817834267f.bin'], [860, 20644, 'similarity-scores/f1f37c32c300f7bd365055c9.bin'], [879, 21100, 'similarity-scores/81719e644f12b1a400797d82.bin'], [860, 20644, 'similarity-scores/69fb24c851caebe8ed81ae03.bin'], [896, 21508, 'similarity-scores/26e9cbdcf0abf33b8a20c0fb.bin'], [929, 22300, 'similarity-scores/f74670768d2f987b4fb1b01b.bin'], [908, 21796, 'similarity-scores/3fca1b757b3f154bf1424045.bin'], [883, 21196, 'similarity-scores/06e7f5024859f3403eba7796.bin'], [919, 22060, 'similarity-scores/2ffa682d3afe6b54fff78835.bin'], [920, 22084, 'similarity-scores/498ef5687d1dcab93127d8eb.bin'], [837, 20092, 'similarity-scores/7befc3448aaf0822a0224496.bin'], [887, 21292, 'similarity-scores/615b4d08d37990b461ac70e9.bin'], [836, 20068, 'similarity-scores/e91720668cf63cc0769491b9.bin'], [924, 22180, 'similarity-scores/3ff5c3d9704a5a97163e2cd5.bin...', ...],)", 'kwargsrepr': "{'project_id': '8efc27085d66234af616468b4251613028f05fa792c02df9', 'run_id': 'eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1', 'parent_span': {'uber-trace-id': 'e38e6235a8b07c59:5a1d8351c03ad888:1e07d0b84a4e637d:1'}}", 'origin': 'gen8@6dc4aae3ab70', 'ignore_result': True, 'redelivered': True, 'reply_to': 'fa078a18-bf93-3164-8de4-0665067672f7', 'correlation_id': '33ed3bfa-0953-429b-a530-c2818051fc31', 'hostname': 'celery@bec4c46dba22', 'delivery_info': {'exchange': '', 'routing_key': 'highmemory', 'priority': 0, 'redelivered': None}, 'args': [[[18129, 435100, 'similarity-scores/771dc665dafe5bf0cbd98d2e.bin'], [871, 20908, 'similarity-scores/31cae8ef35597b136be6e185.bin'], [811, 19468, 'similarity-scores/fa76d3d05a26b3817834267f.bin'], [860, 20644, 'similarity-scores/f1f37c32c300f7bd365055c9.bin'], [879, 21100, 'similarity-scores/81719e644f12b1a400797d82.bin'], [860, 20644, 'similarity-scores/69fb24c851caebe8ed81ae03.bin'], [896, 21508, 'similarity-scores/26e9cbdcf0abf33b8a20c0fb.bin'], [929, 22300, 'similarity-scores/f74670768d2f987b4fb1b01b.bin'], [908, 21796, 'similarity-scores/3fca1b757b3f154bf1424045.bin'], [883, 21196, 'similarity-scores/06e7f5024859f3403eba7796.bin'], [919, 22060, 'similarity-scores/2ffa682d3afe6b54fff78835.bin'], [920, 22084, 'similarity-scores/498ef5687d1dcab93127d8eb.bin'], [837, 20092, 'similarity-scores/7befc3448aaf0822a0224496.bin'], [887, 21292, 'similarity-scores/615b4d08d37990b461ac70e9.bin'], [836, 20068, 'similarity-scores/e91720668cf63cc0769491b9.bin'], [924, 22180, 'similarity-scores/3ff5c3d9704a5a97163e2cd5.bin'], [878, 21076, 'similarity-scores/4c7ead830a841a3366663e76.bin'], [858, 20596, 'similarity-scores/2175593b9f3fbac45767a875.bin'], [853, 20476, 'similarity-scores/2546fe61601f8ec3fa087fdb.bin'], [889, 21340, 'similarity-scores/5a86e1059f0ee9ca9033ad56.bin'], [934, 22420, 'similarity-scores/eb2d93507c788eaa15a39f8e.bin'], [920, 22084, 'similarity-scores/6ebe9e0973cbad97c975d78f.bin'], [893, 21436, 'similarity-scores/e0ce6c8af04fead1a414bfc2.bin'], [882, 21172, 'similarity-scores/b21a1505cfcb6952a209163c.bin'], [914, 21940, 'similarity-scores/75c013d872e16288db3ff2d3.bin'], [910, 21844, 'similarity-scores/6e15c5aa835fb218d6bf588c.bin'], [928, 22276, 'similarity-scores/4855fd93634871ad004dad20.bin'], [867, 20812, 'similarity-scores/5fbc3a270cfe4b173d886320.bin'], [841, 20188, 'similarity-scores/d6bd858b6b5519a9392e0e17.bin'], [887, 21292, 'similarity-scores/d76e34f22dd656b373041df6.bin'], [948, 22756, 'similarity-scores/c3ac231c58efeaaf70057310.bin'], [908, 21796, 'similarity-scores/4227445c6b7c

NACHC-CAD avatar Dec 31 '21 16:12 NACHC-CAD