When I tried to calculate_feature_matrix by chunks, I kept encountering ValueError, following which a Fatal Python error usually occurred. Please note this error only occured after some chunks calculation, and no error showed up if I continued from where it failed with restarting the python script. Please see below for full trace info.
2022-11-10 15:31:53,351 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 3.04 GiB -- Worker memory limit: 3.79 GiB
Traceback (most recent call last):
File "/home/zzz/python/test.py", line 306, in ft_test
feature_matrix_ = ft.calculate_feature_matrix(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/featuretools/computational_backends/calculate_feature_matrix.py", line 316, in calculate_feature_matrix
feature_matrix = parallel_calculate_chunks(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/featuretools/computational_backends/calculate_feature_matrix.py", line 792, in parallel_calculate_chunks
client.replicate([_es, _saved_features])
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/client.py", line 3481, in replicate
return self.sync(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/utils.py", line 338, in sync
return sync(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/utils.py", line 405, in sync
raise exc.with_traceback(tb)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/utils.py", line 378, in f
result = yield future
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/client.py", line 3439, in _replicate
await self.scheduler.replicate(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/core.py", line 1153, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/core.py", line 943, in send_recv
raise exc.with_traceback(tb)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/core.py", line 769, in _handle_comm
result = await result
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/scheduler.py", line 5781, in replicate
for ws in random.sample(tuple(workers - ts.who_has), count):
File "/home/zzz/.conda/envs/test/lib/python3.9/random.py", line 449, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
2022-11-10 15:31:53,461 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 2.99 GiB -- Worker memory limit: 3.79 GiB
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
Traceback (most recent call last):
File "/home/zzz/.conda/envs/test/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/zzz/.conda/envs/test/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/process.py", line 236, in _watch_process
assert exitcode is not None
AssertionError
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
Traceback (most recent call last):
File "/home/zzz/.conda/envs/test/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Using EntitySet persisted on the cluster as dataset EntitySet-a3d41f24f216a89dd794828f2871b580
self.run()
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.bufferedwriter name="<stderr>"> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x17a50a0)
Current thread 0x00007f992d262280 (most recent call first):
<no Python frame>
Also, something that may be relevant with previous fatal error, there're tons of fragmented and Unmanaged memory use warning in the log:
/home/zzz/.conda/envs/test/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:938: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
return data.assign(**new_cols)
2022-11-11 09:39:14,505 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 3.15 GiB -- Worker memory limit: 3.79 GiB
Any ideas would be highly appreciated! Best regard!
Hi, could you list the version of Featuretools
, woodwork
, dask[dataframe]
and distributed
you are using? Thanks!