pytorch-forecasting Multi-GPU training results in "ProcessExitedException process 0 terminated with signal SIGSEGV" exception for Baseline and TFT models.

PyTorch-Forecasting version: 1.0.0
PyTorch version: 2.0.1+cu117
Lightning version: 2.0.4
Python version: 3.10.11
Operating System: Linux-5.10.0-23-cloud-amd64-x86_64-with-glibc2.31 (Google Cloud)

Expected behavior

I am trying to run the exact code from the stallion example for TFTs on a multi-gpu device in preparation to train a similar model on my own data in the same environment. I am able to run on a single GPU machine without issue and would expect to be able to run it without issue on a multi-gpu machine (especially when specifying to use only 1 of the multiple GPUs with devices=1). I have also tested out a similar script with my own data and am running into the same issues.

Actual behavior

When I run the same code on a multi GPU machine I get the following error both fitting the Baseline model and the TFT model.

Baseline

# calculate baseline mean absolute error, i.e. predict next value as the last available value from the history
baseline_predictions = Baseline().predict(val_dataloader, return_y=True)
MAE()(baseline_predictions.output, baseline_predictions.y)

Output

Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
---------------------------------------------------------------------------
ProcessExitedException                    Traceback (most recent call last)
Cell In[6], line 2
      1 # # calculate baseline mean absolute error, i.e. predict next value as the last available value from the history
----> 2 baseline_predictions = Baseline().predict(val_dataloader, return_y=True)
      3 MAE()(baseline_predictions.output, baseline_predictions.y)

File ~/.local/lib/python3.10/site-packages/pytorch_forecasting/models/base_model.py:1423, in BaseModel.predict(self, data, mode, return_index, return_decoder_lengths, batch_size, num_workers, fast_dev_run, return_x, return_y, mode_kwargs, trainer_kwargs, write_interval, output_dir, **kwargs)
   1421 logging.getLogger("pytorch_lightning").setLevel(logging.WARNING)
   1422 trainer = Trainer(fast_dev_run=fast_dev_run, **trainer_kwargs)
-> 1423 trainer.predict(self, dataloader)
   1424 logging.getLogger("lightning").setLevel(log_level_lighting)
   1425 logging.getLogger("pytorch_lightning").setLevel(log_level_pytorch_lightning)

File /opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:845, in Trainer.predict(self, model, dataloaders, datamodule, return_predictions, ckpt_path)
    843     model = _maybe_unwrap_optimized(model)
    844     self.strategy._lightning_module = model
--> 845 return call._call_and_handle_interrupt(
    846     self, self._predict_impl, model, dataloaders, datamodule, return_predictions, ckpt_path
    847 )

File /opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:41, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     39 try:
     40     if trainer.strategy.launcher is not None:
---> 41         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     42     return trainer_fn(*args, **kwargs)
     44 except _TunerExitException:

File /opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py:124, in _MultiProcessingLauncher.launch(self, function, trainer, *args, **kwargs)
    116 process_context = mp.start_processes(
    117     self._wrapping_function,
    118     args=process_args,
   (...)
    121     join=False,  # we will join ourselves to get the process references
    122 )
    123 self.procs = process_context.processes
--> 124 while not process_context.join():
    125     pass
    127 worker_output = return_queue.get()

File /opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:140, in ProcessContext.join(self, timeout)
    138 if exitcode < 0:
    139     name = signal.Signals(-exitcode).name
--> 140     raise ProcessExitedException(
    141         "process %d terminated with signal %s" %
    142         (error_index, name),
    143         error_index=error_index,
    144         error_pid=failed_process.pid,
    145         exit_code=exitcode,
    146         signal_name=name
    147     )
    148 else:
    149     raise ProcessExitedException(
    150         "process %d terminated with exit code %d" %
    151         (error_index, exitcode),
   (...)
    154         exit_code=exitcode
    155     )

ProcessExitedException: process 0 terminated with signal SIGSEGV

TFT

# configure network and trainer
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateMonitor()  # log the learning rate
logger = TensorBoardLogger("lightning_logs")  # logging results to a tensorboard

trainer = pl.Trainer(
    max_epochs=10,
    accelerator="cuda", #added line vs example code
    strategy="ddp_notebook", #added line vs example code
    devices=2, #added line vs example code
    enable_model_summary=True,
    gradient_clip_val=0.1,
    limit_train_batches=50,  # coment in for training, running valiation every 30 batches
    # fast_dev_run=True,  # comment in to check that networkor dataset has no serious bugs
    callbacks=[lr_logger, early_stop_callback],
    logger=logger,
)

tft = TemporalFusionTransformer.from_dataset(
    training,
    learning_rate=0.03,
    hidden_size=16,
    attention_head_size=2,
    dropout=0.1,
    hidden_continuous_size=8,
    loss=QuantileLoss(),
    log_interval=10,  # uncomment for learning rate finder and otherwise, e.g. to 10 for logging every 10 batches
    optimizer="Ranger",
    reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

Output

[rank: 0] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
[rank: 1] Global seed set to 42
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]

   | Name                               | Type                            | Params
----------------------------------------------------------------------------------------
0  | loss                               | QuantileLoss                    | 0     
1  | logging_metrics                    | ModuleList                      | 0     
2  | input_embeddings                   | MultiEmbedding                  | 1.3 K 
3  | prescalers                         | ModuleDict                      | 256   
4  | static_variable_selection          | VariableSelectionNetwork        | 3.4 K 
5  | encoder_variable_selection         | VariableSelectionNetwork        | 8.0 K 
6  | decoder_variable_selection         | VariableSelectionNetwork        | 2.7 K 
7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K 
8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K 
9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K 
10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K 
11 | lstm_encoder                       | LSTM                            | 2.2 K 
12 | lstm_decoder                       | LSTM                            | 2.2 K 
13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544   
14 | post_lstm_add_norm_encoder         | AddNorm                         | 32    
15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K 
16 | multihead_attn                     | InterpretableMultiHeadAttention | 808   
17 | post_attn_gate_norm                | GateAddNorm                     | 576   
18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K 
19 | pre_output_gate_norm               | GateAddNorm                     | 576   
20 | output_layer                       | Linear                          | 119   
----------------------------------------------------------------------------------------
29.4 K    Trainable params
0         Non-trainable params
29.4 K    Total params
0.118     Total estimated model params size (MB)

ProcessExitedException                    Traceback (most recent call last)
Cell In[11], line 2
      1 # fit network
----> 2 trainer.fit(
      3     tft,
      4     train_dataloaders=train_dataloader,
      5     val_dataloaders=val_dataloader,
      6 )

File /opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:531, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    529 model = _maybe_unwrap_optimized(model)
    530 self.strategy._lightning_module = model
--> 531 call._call_and_handle_interrupt(
    532     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    533 )

File /opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:41, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     39 try:
     40     if trainer.strategy.launcher is not None:
---> 41         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     42     return trainer_fn(*args, **kwargs)
     44 except _TunerExitException:

File /opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py:124, in _MultiProcessingLauncher.launch(self, function, trainer, *args, **kwargs)
    116 process_context = mp.start_processes(
    117     self._wrapping_function,
    118     args=process_args,
   (...)
    121     join=False,  # we will join ourselves to get the process references
    122 )
    123 self.procs = process_context.processes
--> 124 while not process_context.join():
    125     pass
    127 worker_output = return_queue.get()

File /opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:140, in ProcessContext.join(self, timeout)
    138 if exitcode < 0:
    139     name = signal.Signals(-exitcode).name
--> 140     raise ProcessExitedException(
    141         "process %d terminated with signal %s" %
    142         (error_index, name),
    143         error_index=error_index,
    144         error_pid=failed_process.pid,
    145         exit_code=exitcode,
    146         signal_name=name
    147     )
    148 else:
    149     raise ProcessExitedException(
    150         "process %d terminated with exit code %d" %
    151         (error_index, exitcode),
   (...)
    154         exit_code=exitcode
    155     )

ProcessExitedException: process 0 terminated with signal SIGSEGV

Code to reproduce the problem

I copied the code exactly from here

The only changes made were additional specification of multi-gpu parameters in the TFT Trainer call:

trainer = pl.Trainer(
    max_epochs=10,
    accelerator="cuda", #added line vs example code
    strategy="ddp_notebook", #added line vs example code
    devices=2, #added line vs example code
    enable_model_summary=True,
    gradient_clip_val=0.1,
    limit_train_batches=50,  # coment in for training, running valiation every 30 batches
    # fast_dev_run=True,  # comment in to check that networkor dataset has no serious bugs
    callbacks=[lr_logger, early_stop_callback],
    logger=logger,
)

Potential Solution

I have spent a couple weeks trying to resolve these issues and it seems to be at least related to a memory sharing issue between GPUs. I have found one possible solution on the lightning forum here, but am still relatively new to this package and am struggling to figure out a generalized way to implement this fix while building the model with the from_dataset() method while also maintaining maximum flexibility of the model to train in CPU, GPU and multi-GPU environments.

Jul 25 '23 15:07 joseph-mcdonald

I am also facing this issue

Sep 12 '23 08:09 this-josh

For what its worth adding

    def train_dataloader(self):
        return train_dataloader

to https://github.com/jdb78/pytorch-forecasting/blob/d8a4462fb12de025f8bef852df1f5b48a7ae5b7c/pytorch_forecasting/models/temporal_fusion_transformer/init.py#L29

doesn't work. Perhaps unsurprisingly.

Sep 13 '23 09:09 this-josh

Yeah, it's frustrating. I'm still trying to work through the issue though I've had to back-burner it lately for other priorities. Let me know if you figure anything out!

Sep 14 '23 20:09 joseph-mcdonald

Same behavior. No luck trying to adapt the solution from the lightning forum.

Sep 15 '23 23:09 ghost

Argh, I typed out a whole description of this and then lost the tab :(

I have a gist which runs the tutorial with two 3090s. Quick summary:

Install pt forecasting as develop
Create own TFT class
Add train_dataloader and test_dataloader, note not val. lightning uses test during training and keeps val as the final hold out set, I think this packages uses test and val the other way around?
Create TFT from __init__ not the handy from_dataset

Here's this gist, it is messy but should give you a start for you to solve your own problems.

Sep 17 '23 09:09 this-josh

@this-josh - thanks! I'll give it a shot and let you know how it goes.

Sep 19 '23 16:09 ghost

Trying the code I get the error ProcessExitedException: process 0 terminated with signal SIGSEGV

Sep 22 '23 18:09 JK87iab

Sorry, I'm not sure why. I just ran this, and it works fine


curl -O https://gist.githubusercontent.com/this-josh/744345bea2053cc75c9d6388f317ca87/raw/49e29f974b83d2fea826db8dbc1dbc924a47b5e4/train.py

mamba activate /tmp/env
mamba create --prefix ./env python=3.10 -y
pip install pytorch-forecasting lightning numpy matplotlib torch pyarrow tensorboard
python train.py

(base) ➜  /tmp nvidia-smi                                                                                  [28/Sep/23 | 15:47]
Thu Sep 28 15:48:35 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   57C    P8    36W / 350W |    448MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:4C:00.0 Off |                  N/A |
|  0%   50C    P8    27W / 350W |     18MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2375      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A     22600      G   /usr/lib/firefox/firefox           23MiB |
|    0   N/A  N/A     29939      G   /usr/lib/xorg/Xorg                 37MiB |
|    0   N/A  N/A     73571      G   ...382312550063625843,131072       36MiB |
|    0   N/A  N/A     74742      G   gnome-control-center                4MiB |
|    0   N/A  N/A    109859      G   /usr/lib/xorg/Xorg                 99MiB |
|    0   N/A  N/A    109995      G   /usr/bin/gnome-shell              130MiB |
|    1   N/A  N/A      2375      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A     29939      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A    109859      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

result

29.4 K    Trainable params
0         Non-trainable params
29.4 K    Total params
0.118     Total estimated model params size (MB)
Epoch 0: 100%|████████████████████████████████████| 1/1 [00:00<00:00,  3.17it/s, train_loss_step=251.0, train_loss_epoch=251.0]`Trainer.fit` stopped: `max_steps=1` reached.
Epoch 0: 100%|████████████████████████████████████| 1/1 [00:00<00:00,  3.17it/s, train_loss_step=251.0, train_loss_epoch=251.0]

hope this helps.

Sep 28 '23 14:09 this-josh

Hm Maybe some issues because I am running on databricks. Using the same versions for the packages I get

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
Number of parameters in network: 29.4k
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

   | Name                               | Type                            | Params
----------------------------------------------------------------------------------------
0  | loss                               | QuantileLoss                    | 0     
1  | logging_metrics                    | ModuleList                      | 0     
2  | input_embeddings                   | MultiEmbedding                  | 1.3 K 
3  | prescalers                         | ModuleDict                      | 256   
4  | static_variable_selection          | VariableSelectionNetwork        | 3.4 K 
5  | encoder_variable_selection         | VariableSelectionNetwork        | 8.0 K 
6  | decoder_variable_selection         | VariableSelectionNetwork        | 2.7 K 
7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K 
8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K 
9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K 
10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K 
11 | lstm_encoder                       | LSTM                            | 2.2 K 
12 | lstm_decoder                       | LSTM                            | 2.2 K 
13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544   
14 | post_lstm_add_norm_encoder         | AddNorm                         | 32    
15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K 
16 | multihead_attn                     | InterpretableMultiHeadAttention | 808   
17 | post_attn_gate_norm                | GateAddNorm                     | 576   
18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K 
19 | pre_output_gate_norm               | GateAddNorm                     | 576   
20 | output_layer                       | Linear                          | 119   
----------------------------------------------------------------------------------------
29.4 K    Trainable params
0         Non-trainable params
29.4 K    Total params
0.118     Total estimated model params size (MB)```


Error trace:


```ProcessExitedException                    Traceback (most recent call last)
File <command-2218893583918656>, line 1041
   1038 print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
   1040 # fit network
-> 1041 trainer.fit(
   1042     tft,
   1043     # train_dataloaders=train_dataloader,
   1044     # val_dataloaders=val_dataloader,
   1045 )

File /databricks/python/lib/python3.10/site-packages/mlflow/utils/autologging_utils/safety.py:432, in safe_patch.<locals>.safe_patch_function(*args, **kwargs)
    417 if (
    418     active_session_failed
    419     or autologging_is_disabled(autologging_integration)
   (...)
    426     # warning behavior during original function execution, since autologging is being
    427     # skipped
    428     with set_non_mlflow_warnings_behavior_for_current_thread(
    429         disable_warnings=False,
    430         reroute_warnings=False,
    431     ):
--> 432         return original(*args, **kwargs)
    434 # Whether or not the original / underlying function has been called during the
    435 # execution of patched code
    436 original_has_been_called = False

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:532, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    530 self.strategy._lightning_module = model
    531 _verify_strategy_supports_compile(model, self.strategy)
--> 532 call._call_and_handle_interrupt(
    533     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    534 )

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:42, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     40 try:
     41     if trainer.strategy.launcher is not None:
---> 42         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     43     return trainer_fn(*args, **kwargs)
     45 except _TunerExitException:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py:127, in _MultiProcessingLauncher.launch(self, function, trainer, *args, **kwargs)
    119 process_context = mp.start_processes(
    120     self._wrapping_function,
    121     args=process_args,
   (...)
    124     join=False,  # we will join ourselves to get the process references
    125 )
    126 self.procs = process_context.processes
--> 127 while not process_context.join():
    128     pass
    130 worker_output = return_queue.get()

File /databricks/python/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:140, in ProcessContext.join(self, timeout)
    138 if exitcode < 0:
    139     name = signal.Signals(-exitcode).name
--> 140     raise ProcessExitedException(
    141         "process %d terminated with signal %s" %
    142         (error_index, name),
    143         error_index=error_index,
    144         error_pid=failed_process.pid,
    145         exit_code=exitcode,
    146         signal_name=name
    147     )
    148 else:
    149     raise ProcessExitedException(
    150         "process %d terminated with exit code %d" %
    151         (error_index, exitcode),
   (...)
    154         exit_code=exitcode
    155     )

ProcessExitedException: process 1 terminated with signal SIGSEGV```

Sep 28 '23 19:09 JK87iab

I fixed mine by launching the code as a script and usingstrategy="auto" in the Trainer without any real changes to the model/dataset. Unfortunately, notebooks are notorious for having issues with multiprocessing

Dec 14 '23 19:12 tRosenflanz

I fixed mine by launching the code as a script and usingstrategy="auto" in the Trainer without any real changes to the model/dataset. Unfortunately, notebooks are notorious for having issues with multiprocessing

Are you saying that this is all you changed? You didn't have to effectively rebuild the the TFT class? So fix would be >> run a script not a note book >>set strategy="auto". Can you share what version you were using?

I have been sidetracked away from this project for the past few months and am just getting back to it now. I appreciate the discussion from everyone though.

Dec 18 '23 16:12 joseph-mcdonald

Are you saying that this is all you changed? You didn't have to effectively rebuild the the TFT class? So fix would be >> run a script not a note book >>set strategy="auto". Can you share what version you were using?

Yes, that's all I changed. No class rebuilding. I am on 1.0 (installed with pip). I think the notebook version has issues with sharing the dataset across multiple processes but there is no issue when running as scripts - fairly typical even on Ubuntu machines

Dec 18 '23 17:12 tRosenflanz

Thank you I will try it out and let you know. Feels too simple to be true, but that is usually how it goes.

Dec 18 '23 18:12 joseph-mcdonald

@joseph-mcdonald: Did you find the solution? I am also facing the same issue but couldn't find any workaround.

Jan 08 '24 18:01 sacmax

@joseph-mcdonald: Did you find the solution? I am also facing the same issue but couldn't find any workaround.

Wh

Are you saying that this is all you changed? You didn't have to effectively rebuild the the TFT class? So fix would be >> run a script not a note book >>set strategy="auto". Can you share what version you were using?

Yes, that's all I changed. No class rebuilding. I am on 1.0 (installed with pip). I think the notebook version has issues with sharing the dataset across multiple processes but there is no issue when running as scripts - fairly typical even on Ubuntu machines

Yes, tRosenflanz response solved it for me. set strategy="auto" and don't run it in a notebook.

Mar 21 '24 17:03 joseph-mcdonald

.

Mar 21 '24 17:03 joseph-mcdonald

pytorch-forecasting pytorch-forecasting copied to clipboard

Multi-GPU training results in "ProcessExitedException process 0 terminated with signal SIGSEGV" exception for Baseline and TFT models.

Expected behavior

Actual behavior

Baseline

Output

TFT

Output

Code to reproduce the problem

Potential Solution

pytorch-forecasting
pytorch-forecasting copied to clipboard