spacetimeformer
spacetimeformer copied to clipboard
ValueError: SyncBatchNorm layers only work with GPU modules
Looks like the GPU in colab is not being engaged. Tried using A100, V100, T4 GPU, and TPU hardware settings in colab. command:
python train.py spacetimeformer mnist --embed_method spatio-temporal --local_self_attn full --local_cross_attn full --global_self_attn full --global_cross_attn full --run_name mnist_spatiotemporal --context_points 10
Error trace:
2023-12-30 20:47:30.093968: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-12-30 20:47:30.094027: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-12-30 20:47:30.095405: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-12-30 20:47:31.265649: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Using default wandb log dir path of ./data/STF_LOG_DIR. This can be adjusted with the environment variable
STF_LOG_DIR`
Forecaster
L2: 1e-06
Linear Window: 0
Linear Shared Weights: False
RevIN: False
Decomposition: False
GlobalSelfAttn: AttentionLayer(
(inner_attention): FullAttention(
(dropout): Dropout(p=0.0, inplace=False)
)
(query_projection): Linear(in_features=200, out_features=800, bias=True)
(key_projection): Linear(in_features=200, out_features=800, bias=True)
(value_projection): Linear(in_features=200, out_features=800, bias=True)
(out_projection): Linear(in_features=800, out_features=200, bias=True)
(dropout_qkv): Dropout(p=0.0, inplace=False)
)
GlobalCrossAttn: AttentionLayer(
(inner_attention): FullAttention(
(dropout): Dropout(p=0.0, inplace=False)
)
(query_projection): Linear(in_features=200, out_features=800, bias=True)
(key_projection): Linear(in_features=200, out_features=800, bias=True)
(value_projection): Linear(in_features=200, out_features=800, bias=True)
(out_projection): Linear(in_features=800, out_features=200, bias=True)
(dropout_qkv): Dropout(p=0.0, inplace=False)
)
LocalSelfAttn: AttentionLayer(
(inner_attention): FullAttention(
(dropout): Dropout(p=0.0, inplace=False)
)
(query_projection): Linear(in_features=200, out_features=800, bias=True)
(key_projection): Linear(in_features=200, out_features=800, bias=True)
(value_projection): Linear(in_features=200, out_features=800, bias=True)
(out_projection): Linear(in_features=800, out_features=200, bias=True)
(dropout_qkv): Dropout(p=0.0, inplace=False)
)
LocalCrossAttn: AttentionLayer(
(inner_attention): FullAttention(
(dropout): Dropout(p=0.0, inplace=False)
)
(query_projection): Linear(in_features=200, out_features=800, bias=True)
(key_projection): Linear(in_features=200, out_features=800, bias=True)
(value_projection): Linear(in_features=200, out_features=800, bias=True)
(out_projection): Linear(in_features=800, out_features=200, bias=True)
(dropout_qkv): Dropout(p=0.0, inplace=False)
)
Using Embedding: spatio-temporal
Time Emb Dim: 6
Space Embedding: True
Time Embedding: True
Val Embedding: True
Given Embedding: True
Null Value: None
Pad Value: None
Reconstruction Dropout: Timesteps 0.05, Standard 0.1, Seq (max len = 5) 0.2, Skip All Drop 1.0
*** Spacetimeformer (v1.5) Summary: ***
Model Dim: 200
FF Dim: 800
Enc Layers: 3
Dec Layers: 3
Embed Dropout: 0.2
FF Dropout: 0.3
Attn Out Dropout: 0.0
Attn Matrix Dropout: 0.0
QKV Dropout: 0.0
L2 Coeff: 1e-06
Warmup Steps: 0
Normalization Scheme: batch
Attention Time Windows: 1
Shifted Time Windows: False
Position Emb Type: abs
Recon Loss Imp: 0.0
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./images/MNIST/raw/train-images-idx3-ubyte.gz 100% 9912422/9912422 [00:00<00:00, 199942825.48it/s] Extracting ./images/MNIST/raw/train-images-idx3-ubyte.gz to ./images/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./images/MNIST/raw/train-labels-idx1-ubyte.gz 100% 28881/28881 [00:00<00:00, 149735097.43it/s] Extracting ./images/MNIST/raw/train-labels-idx1-ubyte.gz to ./images/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./images/MNIST/raw/t10k-images-idx3-ubyte.gz 100% 1648877/1648877 [00:00<00:00, 43603948.10it/s] Extracting ./images/MNIST/raw/t10k-images-idx3-ubyte.gz to ./images/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./images/MNIST/raw/t10k-labels-idx1-ubyte.gz 100% 4542/4542 [00:00<00:00, 32234397.24it/s] Extracting ./images/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./images/MNIST/raw
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing Trainer(accelerator='dp')
has been deprecated in v1.5 and will be removed in v1.7. Use Trainer(strategy='dp')
instead.
rank_zero_deprecation(
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:597: UserWarning: 'dp' is not supported on CPUs, hence setting strategy='ddp'
.
rank_zero_warn(f"{strategy_flag!r} is not supported on CPUs, hence setting strategy='ddp'
.")
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: max_epochs
was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1
.
rank_zero_warn(
GPU available: True, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:1823: PossibleUserWarning: GPU available but not used. Set accelerator
and devices
using Trainer(accelerator='gpu', devices=1)
.
rank_zero_warn(
Trainer(limit_val_batches=1.0)
was configured so 100% of the batches will be used..
Trainer(val_check_interval=1.0)
was configured so validation will run at the end of the training epoch..
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
distributed_backend=gloo All distributed processes registered. Starting with 1 processes
Traceback (most recent call last):
File "/content/spacetimeformer/spacetimeformer/train.py", line 869, in
have you soloved this problem?
我也遇到这个问题了,发现问题是,必须要安装pytorchgpu版本 ,其次在运行训练的文件最后加上 --gpus 0 像是这样spacetimeformer mnist --embed_method spatio-temporal --local_self_attn full --local_cross_attn full --global_self_attn full --global_cross_attn full --run_name mnist_spatiotemporal --context_points 20 --gpus 0