smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

Multiprocessing error with s3_iter_bucket on Mac OSx

Open lambdamusic opened this issue 6 years ago • 13 comments
trafficstars

OS: 10.14.1 (Mojave) Python: 3.7.2 (brew)

I've been using s3_iter_bucket to traverse a S3 bucket, but no matter how many workers I use (tried the default 16, then 8 and then 1), python crashes with a multiprocessing error.

Not sure if this is an OS or smart_open issue, but do wonder if anyone else experienced it.

This is the relevant bit when I'm calling smart_open:

# ...
# iterate only through one dir at a time
for key, content in s3_iter_bucket(bucket, prefix=bucket_prefix, workers=1):
      click.secho(">>>>> File: " + key + str(len(content)), fg="green")
      parse_and_index_data(content, index_name, host_name, key)
# ...

And this is the usual error after a few thousand items have been processed (well this is what I see after I hit ctrl-C as python crashes with a system dialog and everything hangs):

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/queues.py", line 352, in get
    res = self._reader.recv_bytes()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

Any hints? One workaround would be to be able to set _MULTIPROCESSING = False when calling s3_iter_bucket, but that is not possible at the moment...

lambdamusic avatar Mar 07 '19 09:03 lambdamusic

@lambdamusic thanks for reporting. What is the actual error your seeing (versus the expected result)?

You mention some Python crash (segfault?), but then show some traceback?

A reproducible example would go a long way.

piskvorky avatar Mar 07 '19 09:03 piskvorky

No error in the terminal. Just hangs. An OS application window saying Python has crashed.

This is what I was able to find in the console log:

rocess:               Python [12146]
Path:                  /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Parent Process:        Python [11691]
Responsible:           Python [12146]
10  _scproxy.cpython-37m-darwin.so	0x000000011023a8d7 get_proxy_settings + 32
11  org.python.python             	0x000000010f41d696 _PyMethodDef_RawFastCallKeywords + 591
12  org.python.python             	0x000000010f41cbd3 _PyCFunction_FastCallKeywords + 44
13  org.python.python             	0x000000010f4b25f0 call_function + 636
14  org.python.python             	0x000000010f4ab2cf _PyEval_EvalFrameDefault + 7174
15  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
16  org.python.python             	0x000000010f4b2665 call_function + 753
17  org.python.python             	0x000000010f4ab2cf _PyEval_EvalFrameDefault + 7174
18  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
19  org.python.python             	0x000000010f4b2665 call_function + 753
20  org.python.python             	0x000000010f4ab2cf _PyEval_EvalFrameDefault + 7174
21  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
22  org.python.python             	0x000000010f4b2665 call_function + 753
23  org.python.python             	0x000000010f4ab2cf _PyEval_EvalFrameDefault + 7174
24  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
25  org.python.python             	0x000000010f4b2665 call_function + 753
26  org.python.python             	0x000000010f4ab2cf _PyEval_EvalFrameDefault + 7174
27  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
28  org.python.python             	0x000000010f41c801 _PyFunction_FastCallDict + 441
29  org.python.python             	0x000000010f41d931 _PyObject_Call_Prepend + 150
30  org.python.python             	0x000000010f45b05c slot_tp_init + 80
31  org.python.python             	0x000000010f457d28 type_call + 178
32  org.python.python             	0x000000010f41ca39 _PyObject_FastCallKeywords + 359
33  org.python.python             	0x000000010f4b265e call_function + 746
34  org.python.python             	0x000000010f4ab375 _PyEval_EvalFrameDefault + 7340
35  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
36  org.python.python             	0x000000010f41cb98 _PyFunction_FastCallKeywords + 225
37  org.python.python             	0x000000010f4b2665 call_function + 753
38  org.python.python             	0x000000010f4ab231 _PyEval_EvalFrameDefault + 7016
39  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
40  org.python.python             	0x000000010f41cb98 _PyFunction_FastCallKeywords + 225
41  org.python.python             	0x000000010f4b2665 call_function + 753
42  org.python.python             	0x000000010f4ab2cf _PyEval_EvalFrameDefault + 7174
43  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
44  org.python.python             	0x000000010f4b2665 call_function + 753
45  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
46  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
47  org.python.python             	0x000000010f4b2665 call_function + 753
48  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
49  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
50  org.python.python             	0x000000010f41cb98 _PyFunction_FastCallKeywords + 225
51  org.python.python             	0x000000010f4b2665 call_function + 753
52  org.python.python             	0x000000010f4ab375 _PyEval_EvalFrameDefault + 7340
53  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
54  org.python.python             	0x000000010f41cb98 _PyFunction_FastCallKeywords + 225
55  org.python.python             	0x000000010f4b2665 call_function + 753
56  org.python.python             	0x000000010f4ab375 _PyEval_EvalFrameDefault + 7340
57  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
58  org.python.python             	0x000000010f41cb98 _PyFunction_FastCallKeywords + 225
59  org.python.python             	0x000000010f4b2665 call_function + 753
60  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
61  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
62  org.python.python             	0x000000010f41c801 _PyFunction_FastCallDict + 441
63  org.python.python             	0x000000010f509778 partial_call + 378
64  org.python.python             	0x000000010f41cce1 PyObject_Call + 136
65  org.python.python             	0x000000010f4ab548 _PyEval_EvalFrameDefault + 7807
66  org.python.python             	0x000000010f4b2ef7 _PyEval_EvalCodeWithName + 1835
67  org.python.python             	0x000000010f41c801 _PyFunction_FastCallDict + 441
68  org.python.python             	0x000000010f4ab548 _PyEval_EvalFrameDefault + 7807
69  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
70  org.python.python             	0x000000010f4b2665 call_function + 753
71  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
72  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
73  org.python.python             	0x000000010f4b2665 call_function + 753
74  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
75  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
76  org.python.python             	0x000000010f4b2665 call_function + 753
77  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
78  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
79  org.python.python             	0x000000010f41d931 _PyObject_Call_Prepend + 150
80  org.python.python             	0x000000010f45b05c slot_tp_init + 80
81  org.python.python             	0x000000010f457d28 type_call + 178
82  org.python.python             	0x000000010f41ca39 _PyObject_FastCallKeywords + 359
83  org.python.python             	0x000000010f4b265e call_function + 746
84  org.python.python             	0x000000010f4ab2cf _PyEval_EvalFrameDefault + 7174
85  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
86  org.python.python             	0x000000010f4b2665 call_function + 753
87  org.python.python             	0x000000010f4ab231 _PyEval_EvalFrameDefault + 7016
88  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
89  org.python.python             	0x000000010f4b2665 call_function + 753
90  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
91  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
92  org.python.python             	0x000000010f4b2665 call_function + 753
93  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
94  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
95  org.python.python             	0x000000010f4b2665 call_function + 753
96  org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
97  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
98  org.python.python             	0x000000010f4ab548 _PyEval_EvalFrameDefault + 7807
99  org.python.python             	0x000000010f41cfae function_code_fastcall + 112
100 org.python.python             	0x000000010f4b2665 call_function + 753
101 org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
102 org.python.python             	0x000000010f41cfae function_code_fastcall + 112
103 org.python.python             	0x000000010f4b2665 call_function + 753
104 org.python.python             	0x000000010f4ab218 _PyEval_EvalFrameDefault + 6991
105 org.python.python             	0x000000010f41cfae function_code_fastcall + 112
106 org.python.python             	0x000000010f41d931 _PyObject_Call_Prepend + 150
107 org.python.python             	0x000000010f41cce1 PyObject_Call + 136
108 org.python.python             	0x000000010f519eee t_bootstrap + 71
109 org.python.python             	0x000000010f4e0847 pythread_wrapper + 25
       0x10f3fb000 -        0x10f3fcfff +org.python.python (3.7.2 - 3.7.2) <1D022F8C-F921-3059-BEF2-C91F0434A9B2> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/Resources/Python.app/Contents/MacOS/Python
       0x10f403000 -        0x10f589ff7 +org.python.python (3.7.2, [c] 2001-2018 Python Software Foundation. - 3.7.2) <255310C1-90AB-3CD7-B676-309C425CFC73> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/Python
       0x10f8e0000 -        0x10f8e1fff +_heapq.cpython-37m-darwin.so (0) <7F27633E-D5A9-3E71-9B65-5F276DB2F7E5> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_heapq.cpython-37m-darwin.so
       0x10f965000 -        0x10f96aff7 +_json.cpython-37m-darwin.so (0) <FCBCFF72-1F16-343B-8544-5F810977C270> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_json.cpython-37m-darwin.so
       0x10fa2f000 -        0x10fa32fff +_struct.cpython-37m-darwin.so (0) <D5C08DBC-5909-31B5-85D0-8FC2BFA85B14> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_struct.cpython-37m-darwin.so
       0x10fac2000 -        0x10fac2fff +_opcode.cpython-37m-darwin.so (0) <1140A364-57E0-3A5E-B63A-4ED7D1FD1C7C> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_opcode.cpython-37m-darwin.so
       0x10fb05000 -        0x10fb09ffb +math.cpython-37m-darwin.so (0) <8B464989-661E-35CC-9B75-0E752410BBEE> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/math.cpython-37m-darwin.so
       0x10fb0f000 -        0x10fb1aff3 +_datetime.cpython-37m-darwin.so (0) <C0189868-7281-38C0-81FA-350D332F475C> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_datetime.cpython-37m-darwin.so
       0x10fb62000 -        0x10fb6affb +_socket.cpython-37m-darwin.so (0) <44AD2731-61B4-3305-8316-4BCF061144D7> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_socket.cpython-37m-darwin.so
       0x10fbb5000 -        0x10fbb8fff +select.cpython-37m-darwin.so (0) <35125464-971F-3291-A292-BEA60522E76C> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/select.cpython-37m-darwin.so
       0x10fbbd000 -        0x10fbc9fff +_ssl.cpython-37m-darwin.so (0) <1A49D523-602F-3BE7-B9B2-EBC475FD2EBE> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_ssl.cpython-37m-darwin.so
       0x10fe42000 -        0x10fe45ff7 +binascii.cpython-37m-darwin.so (0) <8C089B49-431D-3BF7-A131-BAF8FFAF598A> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/binascii.cpython-37m-darwin.so
       0x10fec9000 -        0x10feccfff +_hashlib.cpython-37m-darwin.so (0) <59C7B4CB-F53D-34B6-8859-40BCB1DB9969> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_hashlib.cpython-37m-darwin.so
       0x10fed0000 -        0x10fed5ffb +_blake2.cpython-37m-darwin.so (0) <799AA4CB-768B-3A94-9596-83EF1194563B> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_blake2.cpython-37m-darwin.so
       0x10fed9000 -        0x10fee9fff +_sha3.cpython-37m-darwin.so (0) <4CEC64CD-1EF4-321E-A1AD-1FD9E2DB14B6> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_sha3.cpython-37m-darwin.so
       0x10feee000 -        0x10feeefff +_bisect.cpython-37m-darwin.so (0) <9BAE67A9-0F97-382B-A43A-310CFA6003DC> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_bisect.cpython-37m-darwin.so
       0x10fef1000 -        0x10fef2fff +_random.cpython-37m-darwin.so (0) <96AB86A6-4976-3E2D-98C3-75715A9BD9AC> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_random.cpython-37m-darwin.so
       0x10ff75000 -        0x10ff76fff +_queue.cpython-37m-darwin.so (0) <57C769ED-3F5E-3294-8C1C-B23A112F26AF> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_queue.cpython-37m-darwin.so
       0x10ffb9000 -        0x10ffb9fff +_uuid.cpython-37m-darwin.so (0) <87CFC6F6-BB41-3814-BABD-EFA2D2577256> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_uuid.cpython-37m-darwin.so
       0x10ffbc000 -        0x10ffbffff +zlib.cpython-37m-darwin.so (0) <95AA4B1E-D850-3D25-977D-9096B45CE49F> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/zlib.cpython-37m-darwin.so
       0x110004000 -        0x110102fff +unicodedata.cpython-37m-darwin.so (0) <CC2D9BAE-60E6-34D8-8138-0CA99A507D1B> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/unicodedata.cpython-37m-darwin.so
       0x1101c7000 -        0x1101c8fff +_bz2.cpython-37m-darwin.so (0) <ED995F35-FBB9-3476-99B4-328046BCAA1D> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_bz2.cpython-37m-darwin.so
       0x1101cc000 -        0x1101cfff7 +_lzma.cpython-37m-darwin.so (0) <2D2DC3F6-730B-3B9D-BE45-C22CC1BE46EE> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_lzma.cpython-37m-darwin.so
       0x1101f5000 -        0x1101f6fff +grp.cpython-37m-darwin.so (0) <62B795FB-63F1-31E1-91BA-E5A6F3DAED83> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/grp.cpython-37m-darwin.so
       0x110239000 -        0x11023afff +_scproxy.cpython-37m-darwin.so (0) <2672629C-CEF4-3C9C-9CF1-90CE6C311723> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_scproxy.cpython-37m-darwin.so
       0x1103ce000 -        0x1103fbffb +_decimal.cpython-37m-darwin.so (0) <BF87C087-C3D5-32AE-855B-31BC8CECE386> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_decimal.cpython-37m-darwin.so
       0x11054e000 -        0x11054ffff +_posixsubprocess.cpython-37m-darwin.so (0) <0A337549-BF12-349B-A4EE-6DBFC12B0E71> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_posixsubprocess.cpython-37m-darwin.so
       0x1105d2000 -        0x1105defff +_pickle.cpython-37m-darwin.so (0) <EC7F2E5D-F1AD-3277-BA96-5EE4E540662F> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_pickle.cpython-37m-darwin.so
       0x1107e8000 -        0x1107efff3 +_elementtree.cpython-37m-darwin.so (0) <EB1A6741-C1C7-3883-9ECA-0167AABE113A> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_elementtree.cpython-37m-darwin.so
       0x1107f6000 -        0x110815ffb +pyexpat.cpython-37m-darwin.so (0) <0959C4B3-EE25-37F5-99EC-23ED7F248801> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/pyexpat.cpython-37m-darwin.so
       0x110821000 -        0x110822fff +termios.cpython-37m-darwin.so (0) <1A5C64F9-72E6-3519-852D-4FBE2CCB2183> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/termios.cpython-37m-darwin.so
       0x110aa6000 -        0x110aabff7 +array.cpython-37m-darwin.so (0) <94E905C9-16AA-3BDF-8054-B689201EA3F4> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/array.cpython-37m-darwin.so
       0x110bb8000 -        0x110bb9ffb +_multiprocessing.cpython-37m-darwin.so (0) <1A8AA498-6FE0-3503-A1BF-C195B9B51484> /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/_multiprocessing.cpython-37m-darwin.so

.. will try to reproduce it..

lambdamusic avatar Mar 07 '19 12:03 lambdamusic

Sorry, I'm still not clear. If Python crashed (OS killed the process), how did you get the traceback in the OP? "Crashing" and "responding to CTRL-C" are mutually exclusive.

piskvorky avatar Mar 07 '19 12:03 piskvorky

Apologies, I might just be doing something wrong. I've just reproduced this again so here is a detailed walkthrough:

  • I run a script which get logs files from S3 and loads them into a local Elasticsearch instance; each time a new log file is found, a notification message gets printed out
  • after ~100k records (number varies, but roughly) I get a system error saying that Python quit unexpectedly: screenshot 2019-03-07 at 12 53 51
  • the terminal window where the script is running just stops updating - no more notifications messages. Also, nothing gets written to Elasticsearch. So I assume it all halted.
  • if I hit Ctrl-C I get the Traceback I posted above
  • if I run Console app and look into User Reports, I find a python crash report just like the one above

Makes sense?

lambdamusic avatar Mar 07 '19 13:03 lambdamusic

OK, so it looks like some other Python process crashed (not the main script you were running; possibly some of forked multiprocessing processes).

This might be tricky to debug. Any chance you could share a minimal reproducible example? No Elasticsearch, no external libraries, just a minimal script we can run ourselves and observe the same crash (I have a MacBookPro too).

piskvorky avatar Mar 07 '19 15:03 piskvorky

Here's the script I'm using:

https://gist.github.com/lambdamusic/029c88483922604c18f84e5f164e09a6

Not sure though how useful it is... but in order to reproduce it you'd probably have to run it over a good amount of log files like the ones I'm dealing with.

The logs I'm processing are a few thousand files, each containing 10k rows similar to these:

{"token":"24b2ef8d-b3a4-4308-b290-ac952a09f6d7","received":1551701747934}
{"source": "publications", "has_facets": false, "has_filters": true, "cursor": false, "query": "search publications\nwhere doi = \"10.1101/321265\"\nreturn publications[id + doi + pmid + title + journal + date + times_cited]\n", "dsl_version": "2.15.1", "processing_time": 0.003092372789978981, "evaluation_time": 0.0005003325641155243, "user": "some user", "radar_version": "e7ff995-release_3_19_1", "product_variant": "radar.dsl"}

Thanks a million for the help!

lambdamusic avatar Mar 07 '19 15:03 lambdamusic

I'm thinking it might be RAM-related. for key, content in s3_iter_bucket(…) will keep both key and content in RAM. Did you observe any unusual RAM spikes, when running your script? (especially before the crash)

Unrelated to that, but also related to RAM: Looking at the s3_iter_bucket and multiprocessing code, it will greedily fill the input queue with listed S3 keys, and also the output queue with the downloaded objects. So if your processing of the results is slow (such as, sending them to Elasticsearch is slow), these queues might keep growing and growing, until the OS runs out of RAM and kills the process.

It's an unfortunate property of Python's built-in multiprocessing.Pool – there's no limit on its internal input/output queues. CC @mpenkov am I reading this correctly?

piskvorky avatar Mar 07 '19 16:03 piskvorky

@piskvorky I think your interpretation is correct. @lambdamusic What is the memory usage like on the machine when you start having problems?

I'm not sure about the implementation of multiprocessing.Pool. That's something we need to check when investigating this in more detail. Before we can do that, though, we need a minimal reproducible example. The code posted by @lambdamusic has many moving parts that are likely irrelevant to the problem.

mpenkov avatar Mar 08 '19 03:03 mpenkov

@lambdamusic can you replicate the issue when you replace sending the downloaded data to ES with a simple sleep(5.0)? (= to simulate consuming the downloaded object queue more slowly than it's being populated, a minimal example)

piskvorky avatar Mar 08 '19 07:03 piskvorky

Hey folks thanks for the feedback. I also thought it was a memory issue but didn't notice anything out of the ordinary there. Will look at it again.

Also, I'll try the sleep(5) test later today.

lambdamusic avatar Mar 08 '19 12:03 lambdamusic

@lambdamusic Did you manage to figure this out?

mpenkov avatar Sep 28 '19 13:09 mpenkov

Hi sorry no in the end since I couldn't figure it out I simply changed my application to go through smaller (sub)buckets via subsequent runs. Then the error disappeared, presumably cause the iterator had less objects to go through?

lambdamusic avatar Oct 06 '19 16:10 lambdamusic

Might it be related to: https://bugs.python.org/issue33725 ?

I was having messages like this:

objc[70381]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[70381]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

doing:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

before execution was a workaround for me.

yanickc avatar Jan 22 '20 02:01 yanickc