llm.c
llm.c copied to clipboard
[cudnn_frontend] Error: No execution plans support the graph.
necktwi@CheapFellow:~/workspace/llm.c$ make train_gpt2cu USE_CUDNN=1 CUDNN_FRONTEND_PATH="/home/necktwi/workspace/cudnn-frontend/include"
necktwi@CheapFellow:~/workspace/llm.c$ ./train_gpt2cu
Multi-GPU support is disabled. Using a single GPU.
+-----------------------+----------------------------------------------------+
| Parameter | Value |
+-----------------------+----------------------------------------------------+
| train data pattern | dev/data/tinyshakespeare/tiny_shakespeare_train.bin |
| val data pattern | dev/data/tinyshakespeare/tiny_shakespeare_val.bin |
| output log dir | NULL |
| checkpoint_every | 0 |
| resume | 0 |
| micro batch size B | 4 |
| sequence length T | 1024 |
| total batch size | 4096 |
| LR scheduler | cosine |
| learning rate (LR) | 3.000000e-04 |
| warmup iterations | 0 |
| final LR fraction | 1.000000e+00 |
| weight decay | 0.000000e+00 |
| skip update lossz | 0.000000 |
| skip update gradz | 0.000000 |
| max_steps | -1 |
| val_loss_every | 20 |
| val_max_steps | 20 |
| sample_every | 20 |
| genT | 64 |
| overfit_single_batch | 0 |
| use_master_weights | enabled |
| gelu_fusion | 0 |
| recompute | 1 |
+-----------------------+----------------------------------------------------+
| device | NVIDIA GeForce RTX 2060 |
| peak TFlops | -1.0 |
| precision | BF16 |
+-----------------------+----------------------------------------------------+
| weight init method | gpt2_124M_bf16.bin |
| max_sequence_length T | 1024 |
| vocab_size V | 50257 |
| padded_vocab_size Vp | 50304 |
| num_layers L | 12 |
| num_heads NH | 12 |
| channels C | 768 |
| num_parameters | 124475904 |
+-----------------------+----------------------------------------------------+
| train_num_batches | 74 |
| val_num_batches | 20 |
+-----------------------+----------------------------------------------------+
| run hellaswag | no |
+-----------------------+----------------------------------------------------+
| Zero Optimization is disabled |
| num_processes | 1 |
| zero_stage | 0 |
+-----------------------+----------------------------------------------------+
num_parameters: 124475904 => bytes: 248951808
allocated 237 MiB for model parameters
batch_size B=4 * seq_len T=1024 * num_processes=1 and total_batch_size=4096
=> setting grad_accum_steps=1
allocating 237 MiB for parameter gradients
allocating 1326 MiB for activations
allocating 474 MiB for AdamW optimizer state m
allocating 474 MiB for AdamW optimizer state v
allocating 474 MiB for master copy of params
device memory usage: 3652 MiB / 5740 MiB
memory per sequence: 331 MiB
-> estimated maximum batch size: 10
[CUDNN ERROR] at file llmc/cudnn_att.cpp:120:
[cudnn_frontend] Error: No execution plans support the graph.