pandarallel
pandarallel copied to clipboard
Why the JupyterNoteBook cell is still running after the progress bars have been finished?
Here is the picture of the cell. I noticed that all progress bars finished in about 3mins. However, the cell is still running and stops after 8mins. Why would this happen? Thank you!
Second this. In my case, the dataset has 100+ million rows, and the delay is much longer than 8 mins. As it seems a relatively new issue, it may be connected with using Jupyter Notebook 7 or Pandas 2.0+.
UPD: Allocating more RAM to WSL, hiding the progress bar, setting use_memory_fs=True
and running script in a new notebook actually helped to significantly speed up the process, so the problem may be related to RAM, and not the library.
Some details: Operating System: Ubuntu 22.04.2 LTS (WSL 2) Python version: 3.10.12 Pandas version: 2.1.1 Pandarallel version: 1.6.5 Jupyter Notebook version: 7.0.2
That's what I get if I stop the cell execution when all threads are at 100%:
Process ForkPoolWorker-14:
Process ForkPoolWorker-15:
Process ForkPoolWorker-16:
Process ForkPoolWorker-17:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/queues.py", line 365, in get
res = self._reader.recv_bytes()
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
with self._rlock:
File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
During handling of the above exception, another exception occurred:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
with self._rlock:
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
with self._rlock:
File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
File "/usr/lib/python3.10/multiprocessing/queues.py", line 365, in get
res = self._reader.recv_bytes()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
KeyboardInterrupt
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
[... skipping hidden 1 frame]
Cell In[22], line 1
----> 1 convert_timestamp(train)
File ~/sleep/lib/parallel.py:31, in convert_timestamp(df)
30 gc.collect()
---> 31 df["timestamp"] = df.timestamp.parallel_apply(to_correct_format)
File ~/.local/lib/python3.10/site-packages/pandarallel/core.py:444, in parallelize_with_pipe.<locals>.closure(data, user_defined_function, *user_defined_function_args, **user_defined_function_kwargs)
442 progress_bars.set_error(worker_index)
--> 444 results = results_promise.get()
446 return data_type.reduce(results, reduce_extra)
File /usr/lib/python3.10/multiprocessing/pool.py:768, in ApplyResult.get(self, timeout)
767 def get(self, timeout=None):
--> 768 self.wait(timeout)
769 if not self.ready():
File /usr/lib/python3.10/multiprocessing/pool.py:765, in ApplyResult.wait(self, timeout)
764 def wait(self, timeout=None):
--> 765 self._event.wait(timeout)
File /usr/lib/python3.10/threading.py:607, in Event.wait(self, timeout)
606 if not signaled:
--> 607 signaled = self._cond.wait(timeout)
608 return signaled
File /usr/lib/python3.10/threading.py:320, in Condition.wait(self, timeout)
319 if timeout is None:
--> 320 waiter.acquire()
321 gotit = True
KeyboardInterrupt:
During handling of the above exception, another exception occurred:
KeyboardInterrupt Traceback (most recent call last)
KeyboardInterrupt:
Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.
@thatmee You mentioned that the problem is likely related to the amount of RAM available and not a problem with pandarallel itself. Do you have any other problems?
I'm tempted to close this issue otherwise (or if there isn't a reply in a while).