audiocraft icon indicating copy to clipboard operation
audiocraft copied to clipboard

300M model seems to 400M parameters approximately

Open PabloPeso opened this issue 2 years ago • 1 comments

Hi,

I am trying to get the number of parameters of the LM part only as I think that's the part you are considering when you say the model is 300M parameters. More specifically I am looking at this model lm image

When I count the number of parameters of that model (following the code below) I get 402M parameters.

image

Is there anything wrong with the way I am getting the model size? How am I getting different results?

Thanks.

PabloPeso avatar Aug 03 '23 12:08 PabloPeso

I see something similar. I modified torchinfo in this way: https://github.com/TylerYep/torchinfo/issues/254

Then I did

from audiocraft.models import MusicGen
import torchinfo
model = MusicGen.get_pretrained('facebook/musicgen-small')
model.set_generation_params(duration=2)
with torch.autocast("cuda"):
    model.lm.eval()
    descriptions = ["Jazzy jazz hip hop sonata."]
    conditions, tokens = model._prepare_tokens_and_attributes(descriptions, None)
    K = model.lm.num_codebooks
    S = 32
    tokens = torch.zeros((batch_size, K, S), dtype=torch.int32).cuda()
    input_data = tokens, conditions
    print(torchinfo.summary(model.lm, input_data=input_data, mode='eval', depth=5, col_names=("input_size", "output_size", "num_params"), verbose=1))

and get this output showing 420,371,456 parameters, but maybe 787,456 of those should be ignored.

==================================================================================================================================
Layer (type:depth-idx)                                  Input Shape               Output Shape              Param #
==================================================================================================================================
LMModel                                                 [1, 4, 32]                [1, 4, 32, 2048]          --
├─ModuleList: 1-1                                       --                        --                        --
│    └─ScaledEmbedding: 2-1                             [1, 32]                   [1, 32, 1024]             2,098,176
│    └─ScaledEmbedding: 2-2                             [1, 32]                   [1, 32, 1024]             2,098,176
│    └─ScaledEmbedding: 2-3                             [1, 32]                   [1, 32, 1024]             2,098,176
│    └─ScaledEmbedding: 2-4                             [1, 32]                   [1, 32, 1024]             2,098,176
├─ClassifierFreeGuidanceDropout: 1-2                    --                        --                        --
├─AttributeDropout: 1-3                                 --                        --                        --
├─ConditioningProvider: 1-4                             --                        [1, 11, 1024]             --
│    └─ModuleDict: 2-5                                  --                        --                        --
│    │    └─T5Conditioner: 3-1                          --                        [1, 11, 1024]             --
│    │    │    └─Linear: 4-1                            [1, 11, 768]              [1, 11, 1024]             787,456
├─ConditionFuser: 1-5                                   [1, 32, 1024]             [1, 32, 1024]             --
├─StreamingTransformer: 1-6                             [1, 32, 1024]             [1, 32, 1024]             --
│    └─ModuleList: 2-6                                  --                        --                        --
│    │    └─StreamingTransformerLayer: 3-2              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-2                         [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-3       [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-1                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-4                           [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-5                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-6                         [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-7       [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-2                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-8                           [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-9                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-10                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-11                           [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-12                          [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-13                           [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-14                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-15                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-3              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-16                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-17      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-3                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-18                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-19                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-20                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-21      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-4                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-22                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-23                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-24                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-25                           [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-26                          [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-27                           [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-28                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-29                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-4              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-30                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-31      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-5                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-32                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-33                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-34                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-35      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-6                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-36                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-37                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-38                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-39                           [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-40                          [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-41                           [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-42                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-43                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-5              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-44                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-45      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-7                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-46                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-47                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-48                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-49      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-8                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-50                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-51                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-52                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-53                           [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-54                          [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-55                           [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-56                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-57                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-6              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-58                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-59      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-9                       [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-60                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-61                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-62                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-63      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-10                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-64                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-65                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-66                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-67                           [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-68                          [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-69                           [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-70                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-71                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-7              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-72                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-73      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-11                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-74                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-75                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-76                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-77      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-12                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-78                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-79                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-80                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-81                           [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-82                          [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-83                           [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-84                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-85                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-8              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-86                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-87      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-13                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-88                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-89                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-90                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-91      [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-14                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-92                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-93                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-94                        [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-95                           [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-96                          [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-97                           [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-98                          [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-99                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-9              [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-100                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-101     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-15                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-102                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-103                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-104                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-105     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-16                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-106                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-107                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-108                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-109                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-110                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-111                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-112                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-113                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-10             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-114                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-115     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-17                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-116                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-117                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-118                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-119     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-18                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-120                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-121                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-122                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-123                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-124                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-125                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-126                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-127                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-11             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-128                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-129     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-19                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-130                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-131                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-132                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-133     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-20                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-134                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-135                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-136                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-137                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-138                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-139                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-140                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-141                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-12             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-142                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-143     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-21                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-144                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-145                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-146                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-147     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-22                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-148                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-149                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-150                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-151                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-152                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-153                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-154                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-155                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-13             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-156                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-157     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-23                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-158                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-159                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-160                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-161     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-24                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-162                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-163                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-164                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-165                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-166                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-167                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-168                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-169                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-14             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-170                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-171     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-25                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-172                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-173                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-174                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-175     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-26                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-176                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-177                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-178                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-179                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-180                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-181                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-182                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-183                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-15             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-184                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-185     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-27                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-186                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-187                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-188                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-189     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-28                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-190                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-191                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-192                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-193                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-194                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-195                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-196                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-197                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-16             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-198                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-199     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-29                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-200                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-201                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-202                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-203     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-30                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-204                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-205                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-206                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-207                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-208                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-209                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-210                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-211                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-17             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-212                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-213     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-31                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-214                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-215                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-216                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-217     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-32                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-218                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-219                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-220                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-221                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-222                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-223                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-224                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-225                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-18             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-226                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-227     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-33                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-228                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-229                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-230                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-231     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-34                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-232                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-233                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-234                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-235                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-236                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-237                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-238                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-239                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-19             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-240                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-241     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-35                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-242                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-243                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-244                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-245     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-36                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-246                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-247                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-248                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-249                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-250                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-251                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-252                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-253                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-20             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-254                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-255     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-37                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-256                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-257                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-258                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-259     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-38                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-260                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-261                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-262                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-263                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-264                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-265                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-266                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-267                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-21             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-268                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-269     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-39                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-270                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-271                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-272                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-273     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-40                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-274                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-275                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-276                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-277                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-278                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-279                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-280                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-281                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-22             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-282                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-283     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-41                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-284                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-285                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-286                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-287     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-42                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-288                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-289                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-290                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-291                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-292                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-293                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-294                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-295                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-23             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-296                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-297     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-43                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-298                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-299                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-300                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-301     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-44                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-302                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-303                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-304                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-305                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-306                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-307                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-308                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-309                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-24             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-310                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-311     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-45                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-312                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-313                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-314                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-315     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-46                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-316                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-317                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-318                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-319                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-320                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-321                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-322                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-323                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    └─StreamingTransformerLayer: 3-25             [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-324                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-325     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-47                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-326                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-327                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-328                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─StreamingMultiheadAttention: 4-329     [1, 32, 1024]             [1, 32, 1024]             3,145,728
│    │    │    │    └─Linear: 5-48                      [1, 32, 1024]             [1, 32, 1024]             1,048,576
│    │    │    └─Dropout: 4-330                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-331                        [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─LayerNorm: 4-332                       [1, 32, 1024]             [1, 32, 1024]             2,048
│    │    │    └─Linear: 4-333                          [1, 32, 1024]             [1, 32, 4096]             4,194,304
│    │    │    └─Dropout: 4-334                         [1, 32, 4096]             [1, 32, 4096]             --
│    │    │    └─Linear: 4-335                          [1, 32, 4096]             [1, 32, 1024]             4,194,304
│    │    │    └─Dropout: 4-336                         [1, 32, 1024]             [1, 32, 1024]             --
│    │    │    └─Identity: 4-337                        [1, 32, 1024]             [1, 32, 1024]             --
├─LayerNorm: 1-7                                        [1, 32, 1024]             [1, 32, 1024]             2,048
├─ModuleList: 1-8                                       --                        --                        --
│    └─Linear: 2-7                                      [1, 32, 1024]             [1, 32, 2048]             2,097,152
│    └─Linear: 2-8                                      [1, 32, 1024]             [1, 32, 2048]             2,097,152
│    └─Linear: 2-9                                      [1, 32, 1024]             [1, 32, 2048]             2,097,152
│    └─Linear: 2-10                                     [1, 32, 1024]             [1, 32, 2048]             2,097,152
==================================================================================================================================
Total params: 420,371,456
Trainable params: 420,371,456
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 269.38
==================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 43.30
Params size (MB): 573.89
Estimated Total Size (MB): 617.19
==================================================================================================================================

DBraun avatar Jun 23 '24 02:06 DBraun