prophet
prophet copied to clipboard
Unable to run cross-validation in parallel mode "processes"
Hello, I'm using Prophet v1.0 with Anaconda3 2020.11 on Windows 10 64-bit. I'm trying to run cross-validation in parallel mode "processes" using the example provided in the documentation, but I always get this error message (the error.log is very long so I attached it instead of pasting it here). The code I used:
import pandas as pd
import itertools
import numpy as np
from prophet import Prophet
from prophet.diagnostics import cross_validation
from prophet.diagnostics import performance_metrics
df = pd.read_csv("example_wp_log_peyton_manning.csv")
param_grid = {
"changepoint_prior_scale": [0.001, 0.01, 0.1, 0.5],
"seasonality_prior_scale": [0.01, 0.1, 1.0, 10.0],
}
# Generate all combinations of parameters
all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]
rmses = [] # Store the RMSEs for each params here
# Use cross validation to evaluate all parameters
for params in all_params:
m = Prophet(**params).fit(df) # Fit model with given params
df_cv = cross_validation(m, horizon="30 days", parallel="processes")
df_p = performance_metrics(df_cv, rolling_window=1)
rmses.append(df_p["rmse"].values[0])
# Find the best parameters
tuning_results = pd.DataFrame(all_params)
tuning_results["rmse"] = rmses
print(tuning_results)
If I run the code on Google Colab then everything is fine.
So can anyone help please? Thank you.
Does it work if you run it with parallel
set to None
or to 'threads'
? It's really hard to debug issues with parallel processing. The key part of the error message seems to be:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
but I'm not quite sure what is to be made of that.
As a side note, I did notice the line INFO:prophet:Making 172 forecasts with cutoffs between 2008-12-12 00:00:00 and 2015-12-21 00:00:00
which is a really large number of forecasts for cross validation and will probably be super slow. I'd recommend increasing the "initial" and/or "period" inputs to cross_validation
to get that down to something that won't take so long to run.
Yes the code works if I run it with parallel
set to None
or to threads
. I also tried the code on another computer of mine with same environment, but the result is still the same. If I try it on Google's Colab then the code runs just fine.
Also thank you for the suggestion on the forecasting parameters. I will try to run the code without Anaconda to see if the error still happens.
I guess the workaround is probably to use threads
, I haven't run into this myself in Linux and I probably won't be able to debug the issue.
According to #1434, it seems running cross-validation with parallel
set to threads
is much slower than setting it to processes
. I'm unable to build prophet
on Windows so my another workaround is to use the WSL. In my case there is a huge difference in term of execution time:
- Cross-validation with
parallel
set toprocesses
in WSL (Debian, without Anaconda): 0:02:44.19 (100% CPU usage) - Cross-validation with
parallel
set tothreads
in Windows (with Anaconda): 0:08:40.13 (only around 50% CPU usage)
Hope that in the future there will be someone who can help to debug this issue.
@nviet I just saw #1889 that seems like it might be related. A solution presented there was to do
import multiprocessing
multiprocessing.set_start_method("fork")
prior to importing prophet. Could you see if that works here?
Thanks for your suggestion. Unfortunately the fork
value is available on Unix only and it's the default on Unix. On Windows the only available value is spawn
. This information is available in the Python's official document too.
Trying to set the argument to fork
in Windows will result in this error message:
Traceback (most recent call last):
File "test.py", line 3, in <module>
multiprocessing.set_start_method("fork")
File "d:\Programs\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 246, in set_start_method
self._actual_context = self.get_context(method)
File "d:\Programs\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 238, in get_context
return super().get_context(method)
File "d:\Programs\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 192, in get_context
raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'fork'
The issue lies in the line pool = concurrent.futures.ProcessPoolExecutor()
in the file diagnostics.py
. As shared in an answer to a question on StackOverflow on parallelism on Windows:
Multiprocessing works differently on ms-windows because that OS lacks the
fork
system call used on UNIX and macOS.
fork
creates the child process as a perfect copy of the parent process. All the code and data in both processes are the same. The only difference being the return value of thefork
call. (That is to let the new process know it is a copy.) So the child process has access to (a copy of) all the data in the parent process.On ms-windows, multiprocessing tries to "fake" fork by launching a new python interpreter and have it import your main module. This means (among other things) that your main module has to be importable without side effects such as starting another process. Hence the reason for
if __name__ == '__main__'
. It also means that your worker processes might or might not have access to data created in the parent process, depending on where it is created. It will have access to anything created before__main__
. But it would not have access to anything created inside the main block.
Facing the same issue on Mac.
Regarding the 100% vs 50% CPU utilization, could the problem be that Windows reports all virtual cores (double the number of physical cores for CPU's with hyperthreading)? In my experience with scientific computing, using e.g. 8 cores on a 4 core machine with hyperthreading yields either no benefit or an outright decrease in speed, compared to using just the 4 cores. Are the training time the roughly the same in vanilla Windows as in WSL (WSL2?)?
I had a similar issue when using fb prophet and my solution was analogous to changing:
for params in all_params:
m = Prophet(**params).fit(df) # Fit model with given params
df_cv = cross_validation(m, horizon="30 days", parallel="processes")
df_p = performance_metrics(df_cv, rolling_window=1)
rmses.append(df_p["rmse"].values[0])``
to
for params in all_params:
if __name__ == '__main__':
m = Prophet(**params).fit(df) # Fit model with given params
df_cv = cross_validation(m, horizon="30 days", parallel="processes")
df_p = performance_metrics(df_cv, rolling_window=1)
rmses.append(df_p["rmse"].values[0])
This turned out to be a bug with process
writing multiple tempfiles to the same directory. This PR fixed the bug https://github.com/facebook/prophet/pull/2088/files, and Prophet 1.1 has been released
Please try pip install --upgrade prophet
This issue still persists after upgrading prophet.
Digging into this a bit, here's a reproducible snippet that replicates what Prophet is doing, but without prophet. This errors on Windows:
import concurrent.futures
def times_two(val):
return val * 2
pool = concurrent.futures.ProcessPoolExecutor()
vals = [1, 2, 3, 4]
res = pool.map(times_two, vals)
The error:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
The problem is, the __main__
module gets imported and an infinite loop is created if the user code doesn't have if __name__ == '__main__':
in it.
If you're not routinely testing the package across platforms, it might be a good idea to use something like joblib
(as scikit-learn does), I get the feeling that they've been through all these pains and worked things out. Or use loky
directly.