audiocraft
audiocraft copied to clipboard
300M model seems to 400M parameters approximately
Hi,
I am trying to get the number of parameters of the LM part only as I think that's the part you are considering when you say the model is 300M parameters.
More specifically I am looking at this model lm
When I count the number of parameters of that model (following the code below) I get 402M parameters.
Is there anything wrong with the way I am getting the model size? How am I getting different results?
Thanks.
I see something similar. I modified torchinfo in this way: https://github.com/TylerYep/torchinfo/issues/254
Then I did
from audiocraft.models import MusicGen
import torchinfo
model = MusicGen.get_pretrained('facebook/musicgen-small')
model.set_generation_params(duration=2)
with torch.autocast("cuda"):
model.lm.eval()
descriptions = ["Jazzy jazz hip hop sonata."]
conditions, tokens = model._prepare_tokens_and_attributes(descriptions, None)
K = model.lm.num_codebooks
S = 32
tokens = torch.zeros((batch_size, K, S), dtype=torch.int32).cuda()
input_data = tokens, conditions
print(torchinfo.summary(model.lm, input_data=input_data, mode='eval', depth=5, col_names=("input_size", "output_size", "num_params"), verbose=1))
and get this output showing 420,371,456 parameters, but maybe 787,456 of those should be ignored.
==================================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param #
==================================================================================================================================
LMModel [1, 4, 32] [1, 4, 32, 2048] --
├─ModuleList: 1-1 -- -- --
│ └─ScaledEmbedding: 2-1 [1, 32] [1, 32, 1024] 2,098,176
│ └─ScaledEmbedding: 2-2 [1, 32] [1, 32, 1024] 2,098,176
│ └─ScaledEmbedding: 2-3 [1, 32] [1, 32, 1024] 2,098,176
│ └─ScaledEmbedding: 2-4 [1, 32] [1, 32, 1024] 2,098,176
├─ClassifierFreeGuidanceDropout: 1-2 -- -- --
├─AttributeDropout: 1-3 -- -- --
├─ConditioningProvider: 1-4 -- [1, 11, 1024] --
│ └─ModuleDict: 2-5 -- -- --
│ │ └─T5Conditioner: 3-1 -- [1, 11, 1024] --
│ │ │ └─Linear: 4-1 [1, 11, 768] [1, 11, 1024] 787,456
├─ConditionFuser: 1-5 [1, 32, 1024] [1, 32, 1024] --
├─StreamingTransformer: 1-6 [1, 32, 1024] [1, 32, 1024] --
│ └─ModuleList: 2-6 -- -- --
│ │ └─StreamingTransformerLayer: 3-2 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-2 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-3 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-1 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-4 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-5 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-6 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-7 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-2 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-8 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-9 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-10 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-11 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-12 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-13 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-14 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-15 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-3 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-16 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-17 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-3 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-18 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-19 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-20 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-21 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-4 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-22 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-23 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-24 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-25 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-26 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-27 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-28 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-29 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-4 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-30 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-31 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-5 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-32 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-33 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-34 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-35 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-6 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-36 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-37 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-38 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-39 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-40 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-41 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-42 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-43 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-5 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-44 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-45 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-7 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-46 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-47 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-48 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-49 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-8 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-50 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-51 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-52 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-53 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-54 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-55 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-56 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-57 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-6 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-58 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-59 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-9 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-60 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-61 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-62 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-63 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-10 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-64 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-65 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-66 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-67 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-68 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-69 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-70 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-71 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-7 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-72 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-73 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-11 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-74 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-75 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-76 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-77 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-12 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-78 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-79 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-80 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-81 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-82 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-83 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-84 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-85 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-8 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-86 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-87 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-13 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-88 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-89 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-90 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-91 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-14 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-92 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-93 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-94 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-95 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-96 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-97 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-98 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-99 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-9 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-100 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-101 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-15 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-102 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-103 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-104 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-105 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-16 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-106 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-107 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-108 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-109 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-110 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-111 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-112 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-113 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-10 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-114 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-115 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-17 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-116 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-117 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-118 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-119 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-18 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-120 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-121 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-122 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-123 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-124 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-125 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-126 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-127 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-11 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-128 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-129 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-19 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-130 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-131 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-132 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-133 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-20 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-134 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-135 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-136 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-137 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-138 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-139 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-140 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-141 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-12 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-142 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-143 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-21 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-144 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-145 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-146 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-147 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-22 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-148 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-149 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-150 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-151 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-152 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-153 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-154 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-155 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-13 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-156 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-157 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-23 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-158 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-159 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-160 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-161 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-24 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-162 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-163 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-164 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-165 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-166 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-167 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-168 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-169 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-14 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-170 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-171 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-25 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-172 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-173 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-174 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-175 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-26 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-176 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-177 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-178 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-179 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-180 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-181 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-182 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-183 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-15 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-184 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-185 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-27 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-186 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-187 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-188 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-189 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-28 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-190 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-191 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-192 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-193 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-194 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-195 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-196 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-197 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-16 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-198 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-199 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-29 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-200 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-201 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-202 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-203 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-30 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-204 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-205 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-206 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-207 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-208 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-209 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-210 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-211 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-17 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-212 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-213 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-31 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-214 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-215 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-216 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-217 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-32 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-218 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-219 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-220 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-221 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-222 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-223 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-224 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-225 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-18 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-226 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-227 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-33 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-228 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-229 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-230 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-231 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-34 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-232 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-233 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-234 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-235 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-236 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-237 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-238 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-239 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-19 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-240 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-241 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-35 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-242 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-243 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-244 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-245 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-36 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-246 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-247 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-248 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-249 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-250 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-251 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-252 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-253 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-20 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-254 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-255 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-37 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-256 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-257 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-258 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-259 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-38 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-260 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-261 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-262 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-263 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-264 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-265 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-266 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-267 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-21 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-268 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-269 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-39 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-270 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-271 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-272 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-273 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-40 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-274 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-275 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-276 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-277 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-278 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-279 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-280 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-281 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-22 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-282 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-283 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-41 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-284 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-285 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-286 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-287 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-42 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-288 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-289 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-290 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-291 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-292 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-293 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-294 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-295 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-23 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-296 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-297 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-43 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-298 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-299 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-300 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-301 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-44 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-302 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-303 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-304 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-305 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-306 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-307 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-308 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-309 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-24 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-310 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-311 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-45 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-312 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-313 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-314 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-315 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-46 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-316 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-317 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-318 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-319 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-320 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-321 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-322 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-323 [1, 32, 1024] [1, 32, 1024] --
│ │ └─StreamingTransformerLayer: 3-25 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-324 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-325 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-47 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-326 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-327 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-328 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─StreamingMultiheadAttention: 4-329 [1, 32, 1024] [1, 32, 1024] 3,145,728
│ │ │ │ └─Linear: 5-48 [1, 32, 1024] [1, 32, 1024] 1,048,576
│ │ │ └─Dropout: 4-330 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-331 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─LayerNorm: 4-332 [1, 32, 1024] [1, 32, 1024] 2,048
│ │ │ └─Linear: 4-333 [1, 32, 1024] [1, 32, 4096] 4,194,304
│ │ │ └─Dropout: 4-334 [1, 32, 4096] [1, 32, 4096] --
│ │ │ └─Linear: 4-335 [1, 32, 4096] [1, 32, 1024] 4,194,304
│ │ │ └─Dropout: 4-336 [1, 32, 1024] [1, 32, 1024] --
│ │ │ └─Identity: 4-337 [1, 32, 1024] [1, 32, 1024] --
├─LayerNorm: 1-7 [1, 32, 1024] [1, 32, 1024] 2,048
├─ModuleList: 1-8 -- -- --
│ └─Linear: 2-7 [1, 32, 1024] [1, 32, 2048] 2,097,152
│ └─Linear: 2-8 [1, 32, 1024] [1, 32, 2048] 2,097,152
│ └─Linear: 2-9 [1, 32, 1024] [1, 32, 2048] 2,097,152
│ └─Linear: 2-10 [1, 32, 1024] [1, 32, 2048] 2,097,152
==================================================================================================================================
Total params: 420,371,456
Trainable params: 420,371,456
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 269.38
==================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 43.30
Params size (MB): 573.89
Estimated Total Size (MB): 617.19
==================================================================================================================================