llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

[cudnn_frontend] Error: No execution plans support the graph.

Open Necktwi opened this issue 5 months ago • 2 comments

necktwi@CheapFellow:~/workspace/llm.c$ make train_gpt2cu USE_CUDNN=1 CUDNN_FRONTEND_PATH="/home/necktwi/workspace/cudnn-frontend/include"

necktwi@CheapFellow:~/workspace/llm.c$ ./train_gpt2cu 
Multi-GPU support is disabled. Using a single GPU.
+-----------------------+----------------------------------------------------+
| Parameter             | Value                                              |
+-----------------------+----------------------------------------------------+
| train data pattern    | dev/data/tinyshakespeare/tiny_shakespeare_train.bin |
| val data pattern      | dev/data/tinyshakespeare/tiny_shakespeare_val.bin  |
| output log dir        | NULL                                               |
| checkpoint_every      | 0                                                  |
| resume                | 0                                                  |
| micro batch size B    | 4                                                  |
| sequence length T     | 1024                                               |
| total batch size      | 4096                                               |
| LR scheduler          | cosine                                             |
| learning rate (LR)    | 3.000000e-04                                       |
| warmup iterations     | 0                                                  |
| final LR fraction     | 1.000000e+00                                       |
| weight decay          | 0.000000e+00                                       |
| skip update lossz     | 0.000000                                           |
| skip update gradz     | 0.000000                                           |
| max_steps             | -1                                                 |
| val_loss_every        | 20                                                 |
| val_max_steps         | 20                                                 |
| sample_every          | 20                                                 |
| genT                  | 64                                                 |
| overfit_single_batch  | 0                                                  |
| use_master_weights    | enabled                                            |
| gelu_fusion           | 0                                                  |
| recompute             | 1                                                  |
+-----------------------+----------------------------------------------------+
| device                | NVIDIA GeForce RTX 2060                            |
| peak TFlops           | -1.0                                               |
| precision             | BF16                                               |
+-----------------------+----------------------------------------------------+
| weight init method    | gpt2_124M_bf16.bin                                 |
| max_sequence_length T | 1024                                               |
| vocab_size V          | 50257                                              |
| padded_vocab_size Vp  | 50304                                              |
| num_layers L          | 12                                                 |
| num_heads NH          | 12                                                 |
| channels C            | 768                                                |
| num_parameters        | 124475904                                          |
+-----------------------+----------------------------------------------------+
| train_num_batches     | 74                                                 |
| val_num_batches       | 20                                                 |
+-----------------------+----------------------------------------------------+
| run hellaswag         | no                                                 |
+-----------------------+----------------------------------------------------+
| Zero Optimization is disabled                                              |
| num_processes         | 1                                                  |
| zero_stage            | 0                                                  |
+-----------------------+----------------------------------------------------+
num_parameters: 124475904 => bytes: 248951808
allocated 237 MiB for model parameters
batch_size B=4 * seq_len T=1024 * num_processes=1 and total_batch_size=4096
=> setting grad_accum_steps=1
allocating 237 MiB for parameter gradients
allocating 1326 MiB for activations
allocating 474 MiB for AdamW optimizer state m
allocating 474 MiB for AdamW optimizer state v
allocating 474 MiB for master copy of params
device memory usage: 3652 MiB / 5740 MiB
memory per sequence: 331 MiB
 -> estimated maximum batch size: 10
[CUDNN ERROR] at file llmc/cudnn_att.cpp:120:
[cudnn_frontend] Error: No execution plans support the graph.

Necktwi avatar Sep 19 '24 19:09 Necktwi