pandarallel icon indicating copy to clipboard operation
pandarallel copied to clipboard

Why the JupyterNoteBook cell is still running after the progress bars have been finished?

Open thatmee opened this issue 1 year ago • 3 comments

Snipaste_2023-08-31_17-36-43 Here is the picture of the cell. I noticed that all progress bars finished in about 3mins. However, the cell is still running and stops after 8mins. Why would this happen? Thank you!

thatmee avatar Aug 31 '23 09:08 thatmee

Second this. In my case, the dataset has 100+ million rows, and the delay is much longer than 8 mins. As it seems a relatively new issue, it may be connected with using Jupyter Notebook 7 or Pandas 2.0+.

UPD: Allocating more RAM to WSL, hiding the progress bar, setting use_memory_fs=True and running script in a new notebook actually helped to significantly speed up the process, so the problem may be related to RAM, and not the library.

Some details: Operating System: Ubuntu 22.04.2 LTS (WSL 2) Python version: 3.10.12 Pandas version: 2.1.1 Pandarallel version: 1.6.5 Jupyter Notebook version: 7.0.2

That's what I get if I stop the cell execution when all threads are at 100%:

Process ForkPoolWorker-14:
Process ForkPoolWorker-15:
Process ForkPoolWorker-16:
Process ForkPoolWorker-17:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 365, in get
    res = self._reader.recv_bytes()
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
    with self._rlock:
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt

During handling of the above exception, another exception occurred:


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
    with self._rlock:
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
    with self._rlock:
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 365, in get
    res = self._reader.recv_bytes()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
KeyboardInterrupt
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
    [... skipping hidden 1 frame]

Cell In[22], line 1
----> 1 convert_timestamp(train)

File ~/sleep/lib/parallel.py:31, in convert_timestamp(df)
     30 gc.collect()
---> 31 df["timestamp"] = df.timestamp.parallel_apply(to_correct_format)

File ~/.local/lib/python3.10/site-packages/pandarallel/core.py:444, in parallelize_with_pipe.<locals>.closure(data, user_defined_function, *user_defined_function_args, **user_defined_function_kwargs)
    442         progress_bars.set_error(worker_index)
--> 444 results = results_promise.get()
    446 return data_type.reduce(results, reduce_extra)

File /usr/lib/python3.10/multiprocessing/pool.py:768, in ApplyResult.get(self, timeout)
    767 def get(self, timeout=None):
--> 768     self.wait(timeout)
    769     if not self.ready():

File /usr/lib/python3.10/multiprocessing/pool.py:765, in ApplyResult.wait(self, timeout)
    764 def wait(self, timeout=None):
--> 765     self._event.wait(timeout)

File /usr/lib/python3.10/threading.py:607, in Event.wait(self, timeout)
    606 if not signaled:
--> 607     signaled = self._cond.wait(timeout)
    608 return signaled

File /usr/lib/python3.10/threading.py:320, in Condition.wait(self, timeout)
    319 if timeout is None:
--> 320     waiter.acquire()
    321     gotit = True

KeyboardInterrupt: 

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
KeyboardInterrupt: 

tmvfb avatar Oct 07 '23 16:10 tmvfb

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

nalepae avatar Jan 23 '24 09:01 nalepae

@thatmee You mentioned that the problem is likely related to the amount of RAM available and not a problem with pandarallel itself. Do you have any other problems?

I'm tempted to close this issue otherwise (or if there isn't a reply in a while).

shermansiu avatar Apr 27 '24 10:04 shermansiu