torchinfo
torchinfo copied to clipboard
add show recursive option and add to default row setting
Related to issue #65
Before
===============================================================================================
Layer (type:depth-idx) Output Shape Param #
===============================================================================================
ConvAttnRNN -- --
├─Sequential: 1-1 [1, 512, 2, 8] --
│ └─Conv2d: 2-1 [1, 64, 32, 128] 9,408
│ └─BatchNorm2d: 2-2 [1, 64, 32, 128] 128
│ └─ReLU: 2-3 [1, 64, 32, 128] --
│ └─MaxPool2d: 2-4 [1, 64, 16, 64] --
│ └─Sequential: 2-5 [1, 64, 16, 64] --
│ │ └─BasicBlock: 3-1 [1, 64, 16, 64] 73,984
│ │ └─BasicBlock: 3-2 [1, 64, 16, 64] 73,984
│ └─Sequential: 2-6 [1, 128, 8, 32] --
│ │ └─BasicBlock: 3-3 [1, 128, 8, 32] 230,144
│ │ └─BasicBlock: 3-4 [1, 128, 8, 32] 295,424
│ └─Sequential: 2-7 [1, 256, 4, 16] --
│ │ └─BasicBlock: 3-5 [1, 256, 4, 16] 919,040
│ │ └─BasicBlock: 3-6 [1, 256, 4, 16] 1,180,672
│ └─Sequential: 2-8 [1, 512, 2, 8] --
│ │ └─BasicBlock: 3-7 [1, 512, 2, 8] 3,673,088
│ │ └─BasicBlock: 3-8 [1, 512, 2, 8] 4,720,640
├─Conv2d: 1-2 [1, 256, 2, 8] 131,328
├─AttentionLSTMDecoder: 1 -- --
│ └─DotProductAttention: 2-9 [1, 1, 256] --
│ └─LSTMCell: 2-10 [1, 256] 536,576
│ └─Linear: 2-11 [1, 10] 2,570
│ └─DotProductAttention: 2-12 [1, 1, 256] --
│ └─LSTMCell: 2-13 [1, 256] (recursive)
│ └─Linear: 2-14 [1, 10] (recursive)
│ └─DotProductAttention: 2-15 [1, 1, 256] --
│ └─LSTMCell: 2-16 [1, 256] (recursive)
│ └─Linear: 2-17 [1, 10] (recursive)
│ └─DotProductAttention: 2-18 [1, 1, 256] --
│ └─LSTMCell: 2-19 [1, 256] (recursive)
│ └─Linear: 2-20 [1, 10] (recursive)
│ └─DotProductAttention: 2-21 [1, 1, 256] --
│ └─LSTMCell: 2-22 [1, 256] (recursive)
│ └─Linear: 2-23 [1, 10] (recursive)
===============================================================================================
Total params: 11,846,986
Trainable params: 11,846,986
Non-trainable params: 0
Total mult-adds (G): 1.28
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 13.01
Params size (MB): 47.39
Estimated Total Size (MB): 60.60
===============================================================================================
After:
===============================================================================================
Layer (type:depth-idx) Output Shape Param #
===============================================================================================
ConvAttnRNN -- --
├─Sequential: 1-1 [1, 512, 2, 8] --
│ └─Conv2d: 2-1 [1, 64, 32, 128] 9,408
│ └─BatchNorm2d: 2-2 [1, 64, 32, 128] 128
│ └─ReLU: 2-3 [1, 64, 32, 128] --
│ └─MaxPool2d: 2-4 [1, 64, 16, 64] --
│ └─Sequential: 2-5 [1, 64, 16, 64] --
│ │ └─BasicBlock: 3-1 [1, 64, 16, 64] 73,984
│ │ └─BasicBlock: 3-2 [1, 64, 16, 64] 73,984
│ └─Sequential: 2-6 [1, 128, 8, 32] --
│ │ └─BasicBlock: 3-3 [1, 128, 8, 32] 230,144
│ │ └─BasicBlock: 3-4 [1, 128, 8, 32] 295,424
│ └─Sequential: 2-7 [1, 256, 4, 16] --
│ │ └─BasicBlock: 3-5 [1, 256, 4, 16] 919,040
│ │ └─BasicBlock: 3-6 [1, 256, 4, 16] 1,180,672
│ └─Sequential: 2-8 [1, 512, 2, 8] --
│ │ └─BasicBlock: 3-7 [1, 512, 2, 8] 3,673,088
│ │ └─BasicBlock: 3-8 [1, 512, 2, 8] 4,720,640
├─Conv2d: 1-2 [1, 256, 2, 8] 131,328
├─AttentionLSTMDecoder: 1 -- --
│ └─DotProductAttention: 2-9 [1, 1, 256] --
│ └─LSTMCell: 2-10 [1, 256] 536,576
│ └─Linear: 2-11 [1, 10] 2,570
│ └─DotProductAttention: 2-12 [1, 1, 256] --
│ └─DotProductAttention: 2-15 [1, 1, 256] --
│ └─DotProductAttention: 2-18 [1, 1, 256] --
│ └─DotProductAttention: 2-21 [1, 1, 256] --
===============================================================================================
Total params: 11,846,986
Trainable params: 11,846,986
Non-trainable params: 0
Total mult-adds (G): 1.28
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 13.01
Params size (MB): 47.39
Estimated Total Size (MB): 60.60
===============================================================================================
Codecov Report
Merging #66 (3ff43b7) into main (4c94492) will decrease coverage by
0.23%
. The diff coverage is80.00%
.
@@ Coverage Diff @@
## main #66 +/- ##
==========================================
- Coverage 99.76% 99.53% -0.24%
==========================================
Files 5 5
Lines 427 430 +3
==========================================
+ Hits 426 428 +2
- Misses 1 2 +1
Impacted Files | Coverage Δ | |
---|---|---|
torchinfo/formatting.py | 98.97% <75.00%> (-1.03%) |
:arrow_down: |
torchinfo/torchinfo.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4c94492...3ff43b7. Read the comment docs.
Please remove this option from the "defaults" section. We shouldn't change the default behavior.
Also, I think the proposed output is quite confusing, since it looks like the DotProductAttention layers are all stacked together. In my opinion the output should look like:
===============================================================================================
Layer (type:depth-idx) Output Shape Param #
===============================================================================================
ConvAttnRNN -- --
├─Sequential: 1-1 [1, 512, 2, 8] --
│ └─Conv2d: 2-1 [1, 64, 32, 128] 9,408
│ └─BatchNorm2d: 2-2 [1, 64, 32, 128] 128
│ └─ReLU: 2-3 [1, 64, 32, 128] --
│ └─MaxPool2d: 2-4 [1, 64, 16, 64] --
│ └─Sequential: 2-5 [1, 64, 16, 64] --
│ │ └─BasicBlock: 3-1 [1, 64, 16, 64] 73,984
│ │ └─BasicBlock: 3-2 [1, 64, 16, 64] 73,984
│ └─Sequential: 2-6 [1, 128, 8, 32] --
│ │ └─BasicBlock: 3-3 [1, 128, 8, 32] 230,144
│ │ └─BasicBlock: 3-4 [1, 128, 8, 32] 295,424
│ └─Sequential: 2-7 [1, 256, 4, 16] --
│ │ └─BasicBlock: 3-5 [1, 256, 4, 16] 919,040
│ │ └─BasicBlock: 3-6 [1, 256, 4, 16] 1,180,672
│ └─Sequential: 2-8 [1, 512, 2, 8] --
│ │ └─BasicBlock: 3-7 [1, 512, 2, 8] 3,673,088
│ │ └─BasicBlock: 3-8 [1, 512, 2, 8] 4,720,640
├─Conv2d: 1-2 [1, 256, 2, 8] 131,328
├─AttentionLSTMDecoder: 1 -- --
│ └─DotProductAttention (recursive): 2-9 [1, 1, 256] --
│ └─LSTMCell (recursive): 2-10 [1, 256] 536,576
│ └─Linear (recursive): 2-11 [1, 10] 2,570
===============================================================================================
Total params: 11,846,986
Trainable params: 11,846,986
Non-trainable params: 0
Total mult-adds (G): 1.28
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 13.01
Params size (MB): 47.39
Estimated Total Size (MB): 60.60
===============================================================================================
What do you think?
Also, it would be great if you could add a test case for this feature!
Please remove this option from the "defaults" section. We shouldn't change the default behavior.
The current default behavior is to show the recursive rows. The flag "recursive" should be added in the defaults
to keep the current behavior unchanged.
Also, I think the proposed output is quite confusing, since it looks like the DotProductAttention layers are all stacked together. Also, it would be great if you could add a test case for this feature!
I totally agree. I will update the code.
I'm not sure about the representation of columns "input shapes" and "output shapes" of recursive layers. Should we hide them too?
Please remove this option from the "defaults" section. We shouldn't change the default behavior.
Also, I think the proposed output is quite confusing, since it looks like the DotProductAttention layers are all stacked together. In my opinion the output should look like:
=============================================================================================== Layer (type:depth-idx) Output Shape Param # =============================================================================================== ConvAttnRNN -- -- ├─Sequential: 1-1 [1, 512, 2, 8] -- │ └─Conv2d: 2-1 [1, 64, 32, 128] 9,408 │ └─BatchNorm2d: 2-2 [1, 64, 32, 128] 128 │ └─ReLU: 2-3 [1, 64, 32, 128] -- │ └─MaxPool2d: 2-4 [1, 64, 16, 64] -- │ └─Sequential: 2-5 [1, 64, 16, 64] -- │ │ └─BasicBlock: 3-1 [1, 64, 16, 64] 73,984 │ │ └─BasicBlock: 3-2 [1, 64, 16, 64] 73,984 │ └─Sequential: 2-6 [1, 128, 8, 32] -- │ │ └─BasicBlock: 3-3 [1, 128, 8, 32] 230,144 │ │ └─BasicBlock: 3-4 [1, 128, 8, 32] 295,424 │ └─Sequential: 2-7 [1, 256, 4, 16] -- │ │ └─BasicBlock: 3-5 [1, 256, 4, 16] 919,040 │ │ └─BasicBlock: 3-6 [1, 256, 4, 16] 1,180,672 │ └─Sequential: 2-8 [1, 512, 2, 8] -- │ │ └─BasicBlock: 3-7 [1, 512, 2, 8] 3,673,088 │ │ └─BasicBlock: 3-8 [1, 512, 2, 8] 4,720,640 ├─Conv2d: 1-2 [1, 256, 2, 8] 131,328 ├─AttentionLSTMDecoder: 1 -- -- │ └─DotProductAttention (recursive): 2-9 [1, 1, 256] -- │ └─LSTMCell (recursive): 2-10 [1, 256] 536,576 │ └─Linear (recursive): 2-11 [1, 10] 2,570 =============================================================================================== Total params: 11,846,986 Trainable params: 11,846,986 Non-trainable params: 0 Total mult-adds (G): 1.28 =============================================================================================== Input size (MB): 0.20 Forward/backward pass size (MB): 13.01 Params size (MB): 47.39 Estimated Total Size (MB): 60.60 ===============================================================================================
What do you think?
After a while looking at the source code, I found this line where recursive layers are checked only when they have parameters. I guess that some non-parameters layers, such as nn.Sigmoid
or nn.ReLU
or nn.Softmax
, etc. are not counted as recursive layers even though they are called many times.
An example of this is from ResNet50's Bottleneck model where the self.relu
is called three times in the forward method. In other words, a layer is not marked as recursive if it has no parameter. In my case, the DotProductAttention
module has no learning params since it is defined for convenience purposes. Therefore, the showing up on the table is expected.
Let's keep the input shapes and column shapes for recursive layers.
Keeping the recursive layers that don't have parameters is not ideal, but okay for now since there is currently no existing behavior.
Last couple things before this is ready to merge:
- Rename this setting to
hide_recursive
and remove it from the defaults - Add test case(s)
Closing in favor of #174. Thank you for the work here and the feature suggestion!