swifter
swifter copied to clipboard
Swifter "progress_bar" Not Working
I just started experimenting with Swifter a few minutes ago and have been struggling to get the progress bar to show.
I have the code snippet below, that was appropriated using the example code provided.
Why is the prgress_bar(enable=True) option not working? Is there something wrong with my code?
var_unza_dspace_dataframe["subjectMistakes"] = var_unza_dspace_dataframe["subject"].str.split("=").swifter.allow_dask_on_strings(enable=True).progress_bar(
enable=True, desc='Subjects Mistakes'
).apply(fxn_subject_spellchecker)
Hey @lightonphiri,
Quick question: How fast does this apply run in?
Context: Swifter first tries to vectorize your function. So if it is completing almost instantenously with no progress bar it is because it can't provide a progress bar for a vectorized operation.
Otherwise, let me look into how the progress bar interacts with a pd.Series.. perhaps the .str
call is manipulating the type of series and causing the progress bar not to show
What is your environment? OS, python version, swifter version, pandas version, etc perhaps use the example from #162 so we can compare for similarities
I did a quick test and got a progress bar, so im thinking the .str hypothesis above is wrong
my_string_series = pd.Series(["A STR SERIES"])
my_string_series.str.split(" ").swifter.apply(lambda x: "-".join(x))
Pandas Apply: 100%|██████████████████████████████| 1/1 [00:00<00:00, 750.86it/s]
Thank you for the quick response @jmcarpenter2
> Quick question: How fast does this apply run in? It take a very long time, which is why I ended up discovering Swifter. I have a total of about 6k records I am processing: processing a single record take ~18 secs. Running this on a single CPU would theoretically take ~27 hours to finish
> What is your environment?
- OS: Ubuntu 20.04.4 LTS,
- Python version: Python 3.8.10,
- Swifter version: 1.1.2,
- Pandas version: 1.4.0
Incidentally, as I was frantically trying to find alternatives, I came across Pandarellel and was able to get it to visualise what was going on (see image below): you see I wanted to be certain multiple cores are being used when I run processes.
my_string_series = pd.Series(["A STR SERIES"])
my_string_series.str.split(" ").swifter.apply(lambda x: "-".join(x))
I used the same test and no progress bar was displayed。
OS:win10 21H1 64bit
CPU: AMD Ryzen 5 PRO 4650U with Radeon Graphics
swifter: 1.1.2
pandas: 1.4.1
python: 3.9.7
jupyterlab: 3.3.2
notebook: 6.4.10
Thank you for the quick response @jmcarpenter2
> Quick question: How fast does this apply run in? It take a very long time, which is why I ended up discovering Swifter. I have a total of about 6k records I am processing: processing a single record take ~18 secs. Running this on a single CPU would theoretically take ~27 hours to finish
> What is your environment?
- OS: Ubuntu 20.04.4 LTS,
- Python version: Python 3.8.10,
- Swifter version: 1.1.2,
- Pandas version: 1.4.0
Incidentally, as I was frantically trying to find alternatives, I came across Pandarellel and was able to get it to visualise what was going on (see image below): you see I wanted to be certain multiple cores are being used when I run processes.
Pandarellel is good, but it doesn't support Windows
Hey @jiahe224, isn't this the progress bar? Or did it just never fill out?
The task has finished running, but the progress is shown as 0%, the example you gave shows 100% progress. @jmcarpenter2
Very interesting. That's very good insight.. I'll look into it
Looking forward to your fix, thank you so much for doing this and making my job so much easier!
same issue~
same issue
same issue as above - progress bar shows up but never gets filled out even though the apply operation runs successfully.
same issue
Trying to narrow this one down
Just a quick poll.. can you give me a 👍 if you are experiencing this on a Windows machine? And a 👎 if on a Linux/MacOS?
CC: @PeikaiLi @davera-017 @jn21 @wq624915051