sktime icon indicating copy to clipboard operation
sktime copied to clipboard

[BUG] TSFreshFeatureExtractor RuntimeError outside of main

Open MatthewMiddlehurst opened this issue 3 years ago • 4 comments

Describe the bug

When using the TSFreshFeatureExtractor transformer, the following error is continuously output to the terminal. After noticing this wasn't an issue with the tests or functions used to generate test results, I found that this does not occur when the code is run inside main. if __name__ == "__main__":.

It is totally possible this is just me being Python illiterate, and its common sense that it should be run this way with a main, but will create an issue just in case :).

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\CMP Machine Learning\sktime-workshop-boss\sktime\contrib\local_code.py", line 12, in <module>
    tsfresh.fit_transform(X_train, y_train)
  File "D:\CMP Machine Learning\sktime-workshop-boss\sktime\transformations\base.py", line 91, in fit_transform
    return self.fit(Z, X).transform(Z)
  File "D:\CMP Machine Learning\sktime-workshop-boss\sktime\transformations\panel\tsfresh.py", line 176, in transform
    Xt = extract_features(
  File "D:\CMP Machine Learning\sktime-workshop-boss\venv\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 152, in extract_features
    result = _do_extraction(df=timeseries_container,
  File "D:\CMP Machine Learning\sktime-workshop-boss\venv\lib\site-packages\tsfresh\feature_extraction\extraction.py", line 240, in _do_extraction
    distributor = MultiprocessingDistributor(n_workers=n_jobs,
  File "D:\CMP Machine Learning\sktime-workshop-boss\venv\lib\site-packages\tsfresh\utilities\distribution.py", line 420, in __init__
    self.pool = Pool(processes=n_workers, initializer=initialize_warnings_in_workers, initargs=(show_warnings,))
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 212, in __init__
    self._repopulate_pool()
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\Matthew Middlehurst\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
            if __name__ == '__main__':
                freeze_support()
                ...
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Transformed data will eventually be output however.

    dim_0__variance_larger_than_standard_deviation  ...  dim_0__matrix_profile__feature_"75"__threshold_0.98
0                                              0.0  ...                                           3.334879  
1                                              0.0  ...                                           3.334879  
2                                              0.0  ...                                           3.334879  
3                                              0.0  ...                                           3.334879  
4                                              0.0  ...                                           3.334879  
..                                             ...  ...                                                ...  
62                                             0.0  ...                                           3.334879  
63                                             0.0  ...                                           3.334879  
64                                             0.0  ...                                           3.334879  
65                                             0.0  ...                                           3.334879  
66                                             0.0  ...                                           3.334879  
[67 rows x 787 columns]

To Reproduce

from sktime.datasets import load_italy_power_demand
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)

tsfresh = TSFreshFeatureExtractor(
    default_fc_parameters="comprehensive",
    show_warnings=False,
    disable_progressbar=True,
)

t = tsfresh.fit_transform(X_train, y_train)
print(t)

Versions

This occurs on both my machines running Windows 10, one using Python 3.8 and the other 3.9.

System:
    python: 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]
executable: E:\_ProgramFiles\Anaconda3\envs\sktime-workshop-boss\python.exe
   machine: Windows-10-10.0.19041-SP0
Python dependencies:
          pip: 20.2.4
   setuptools: 50.3.0.post20201006
      sklearn: 0.24.2
       sktime: 0.7.0
  statsmodels: 0.12.1
        numpy: 1.19.4
        scipy: 1.5.3
       Cython: 0.29.21
       pandas: 1.1.4
   matplotlib: 3.4.2
       joblib: 0.17.0
        numba: 0.51.2
     pmdarima: None
      tsfresh: 0.18.0

MatthewMiddlehurst avatar Aug 13 '21 09:08 MatthewMiddlehurst

I've ran this on Macos and Linux without issue so maybe a windows thing if anyone can try it out?

chrisholder avatar Aug 13 '21 10:08 chrisholder

Flashbacks to catch22... Hope the features aren't different between them.

MatthewMiddlehurst avatar Aug 13 '21 10:08 MatthewMiddlehurst

I ran it on windows and recreated the bug. I tried it both with and without this os.environ["MKL_NUM_THREADS"] = "1" os.environ["NUMEXPR_NUM_THREADS"] = "1" os.environ["OMP_NUM_THREADS"] = "1" got the same bug each time

TonyBagnall avatar Aug 13 '21 10:08 TonyBagnall

On: Windows 10, python 3.7, 3.8, 3.9, current 0.10.

  • [x] Developer install: unable to reproduce
  • [ ] Will do another check with "non developer setup".

k1m190r avatar Feb 09 '22 15:02 k1m190r

After some testing, this appears to be fixed in tsfresh 0.19.0. I don't think this issue is serious enough to change the version bounds for the dependency, however.

MatthewMiddlehurst avatar Nov 14 '22 14:11 MatthewMiddlehurst