torchinfo icon indicating copy to clipboard operation
torchinfo copied to clipboard

add show recursive option and add to default row setting

Open VinhLoiIT opened this issue 3 years ago • 8 comments

Related to issue #65

VinhLoiIT avatar Jul 14 '21 14:07 VinhLoiIT

Before

===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
ConvAttnRNN                                   --                        --
├─Sequential: 1-1                             [1, 512, 2, 8]            --
│    └─Conv2d: 2-1                            [1, 64, 32, 128]          9,408
│    └─BatchNorm2d: 2-2                       [1, 64, 32, 128]          128
│    └─ReLU: 2-3                              [1, 64, 32, 128]          --
│    └─MaxPool2d: 2-4                         [1, 64, 16, 64]           --
│    └─Sequential: 2-5                        [1, 64, 16, 64]           --
│    │    └─BasicBlock: 3-1                   [1, 64, 16, 64]           73,984
│    │    └─BasicBlock: 3-2                   [1, 64, 16, 64]           73,984
│    └─Sequential: 2-6                        [1, 128, 8, 32]           --
│    │    └─BasicBlock: 3-3                   [1, 128, 8, 32]           230,144
│    │    └─BasicBlock: 3-4                   [1, 128, 8, 32]           295,424
│    └─Sequential: 2-7                        [1, 256, 4, 16]           --
│    │    └─BasicBlock: 3-5                   [1, 256, 4, 16]           919,040
│    │    └─BasicBlock: 3-6                   [1, 256, 4, 16]           1,180,672
│    └─Sequential: 2-8                        [1, 512, 2, 8]            --
│    │    └─BasicBlock: 3-7                   [1, 512, 2, 8]            3,673,088
│    │    └─BasicBlock: 3-8                   [1, 512, 2, 8]            4,720,640
├─Conv2d: 1-2                                 [1, 256, 2, 8]            131,328
├─AttentionLSTMDecoder: 1                     --                        --
│    └─DotProductAttention: 2-9               [1, 1, 256]               --
│    └─LSTMCell: 2-10                         [1, 256]                  536,576
│    └─Linear: 2-11                           [1, 10]                   2,570
│    └─DotProductAttention: 2-12              [1, 1, 256]               --
│    └─LSTMCell: 2-13                         [1, 256]                  (recursive)
│    └─Linear: 2-14                           [1, 10]                   (recursive)
│    └─DotProductAttention: 2-15              [1, 1, 256]               --
│    └─LSTMCell: 2-16                         [1, 256]                  (recursive)
│    └─Linear: 2-17                           [1, 10]                   (recursive)
│    └─DotProductAttention: 2-18              [1, 1, 256]               --
│    └─LSTMCell: 2-19                         [1, 256]                  (recursive)
│    └─Linear: 2-20                           [1, 10]                   (recursive)
│    └─DotProductAttention: 2-21              [1, 1, 256]               --
│    └─LSTMCell: 2-22                         [1, 256]                  (recursive)
│    └─Linear: 2-23                           [1, 10]                   (recursive)
===============================================================================================
Total params: 11,846,986
Trainable params: 11,846,986
Non-trainable params: 0
Total mult-adds (G): 1.28
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 13.01
Params size (MB): 47.39
Estimated Total Size (MB): 60.60
===============================================================================================

After:

===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
ConvAttnRNN                                   --                        --
├─Sequential: 1-1                             [1, 512, 2, 8]            --
│    └─Conv2d: 2-1                            [1, 64, 32, 128]          9,408
│    └─BatchNorm2d: 2-2                       [1, 64, 32, 128]          128
│    └─ReLU: 2-3                              [1, 64, 32, 128]          --
│    └─MaxPool2d: 2-4                         [1, 64, 16, 64]           --
│    └─Sequential: 2-5                        [1, 64, 16, 64]           --
│    │    └─BasicBlock: 3-1                   [1, 64, 16, 64]           73,984
│    │    └─BasicBlock: 3-2                   [1, 64, 16, 64]           73,984
│    └─Sequential: 2-6                        [1, 128, 8, 32]           --
│    │    └─BasicBlock: 3-3                   [1, 128, 8, 32]           230,144
│    │    └─BasicBlock: 3-4                   [1, 128, 8, 32]           295,424
│    └─Sequential: 2-7                        [1, 256, 4, 16]           --
│    │    └─BasicBlock: 3-5                   [1, 256, 4, 16]           919,040
│    │    └─BasicBlock: 3-6                   [1, 256, 4, 16]           1,180,672
│    └─Sequential: 2-8                        [1, 512, 2, 8]            --
│    │    └─BasicBlock: 3-7                   [1, 512, 2, 8]            3,673,088
│    │    └─BasicBlock: 3-8                   [1, 512, 2, 8]            4,720,640
├─Conv2d: 1-2                                 [1, 256, 2, 8]            131,328
├─AttentionLSTMDecoder: 1                     --                        --
│    └─DotProductAttention: 2-9               [1, 1, 256]               --
│    └─LSTMCell: 2-10                         [1, 256]                  536,576
│    └─Linear: 2-11                           [1, 10]                   2,570
│    └─DotProductAttention: 2-12              [1, 1, 256]               --
│    └─DotProductAttention: 2-15              [1, 1, 256]               --
│    └─DotProductAttention: 2-18              [1, 1, 256]               --
│    └─DotProductAttention: 2-21              [1, 1, 256]               --
===============================================================================================
Total params: 11,846,986
Trainable params: 11,846,986
Non-trainable params: 0
Total mult-adds (G): 1.28
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 13.01
Params size (MB): 47.39
Estimated Total Size (MB): 60.60
===============================================================================================

VinhLoiIT avatar Jul 14 '21 14:07 VinhLoiIT

Codecov Report

Merging #66 (3ff43b7) into main (4c94492) will decrease coverage by 0.23%. The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #66      +/-   ##
==========================================
- Coverage   99.76%   99.53%   -0.24%     
==========================================
  Files           5        5              
  Lines         427      430       +3     
==========================================
+ Hits          426      428       +2     
- Misses          1        2       +1     
Impacted Files Coverage Δ
torchinfo/formatting.py 98.97% <75.00%> (-1.03%) :arrow_down:
torchinfo/torchinfo.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4c94492...3ff43b7. Read the comment docs.

codecov[bot] avatar Jul 20 '21 05:07 codecov[bot]

Please remove this option from the "defaults" section. We shouldn't change the default behavior.

Also, I think the proposed output is quite confusing, since it looks like the DotProductAttention layers are all stacked together. In my opinion the output should look like:

===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
ConvAttnRNN                                   --                        --
├─Sequential: 1-1                             [1, 512, 2, 8]            --
│    └─Conv2d: 2-1                            [1, 64, 32, 128]          9,408
│    └─BatchNorm2d: 2-2                       [1, 64, 32, 128]          128
│    └─ReLU: 2-3                              [1, 64, 32, 128]          --
│    └─MaxPool2d: 2-4                         [1, 64, 16, 64]           --
│    └─Sequential: 2-5                        [1, 64, 16, 64]           --
│    │    └─BasicBlock: 3-1                   [1, 64, 16, 64]           73,984
│    │    └─BasicBlock: 3-2                   [1, 64, 16, 64]           73,984
│    └─Sequential: 2-6                        [1, 128, 8, 32]           --
│    │    └─BasicBlock: 3-3                   [1, 128, 8, 32]           230,144
│    │    └─BasicBlock: 3-4                   [1, 128, 8, 32]           295,424
│    └─Sequential: 2-7                        [1, 256, 4, 16]           --
│    │    └─BasicBlock: 3-5                   [1, 256, 4, 16]           919,040
│    │    └─BasicBlock: 3-6                   [1, 256, 4, 16]           1,180,672
│    └─Sequential: 2-8                        [1, 512, 2, 8]            --
│    │    └─BasicBlock: 3-7                   [1, 512, 2, 8]            3,673,088
│    │    └─BasicBlock: 3-8                   [1, 512, 2, 8]            4,720,640
├─Conv2d: 1-2                                 [1, 256, 2, 8]            131,328
├─AttentionLSTMDecoder: 1                     --                        --
│    └─DotProductAttention (recursive): 2-9   [1, 1, 256]               --
│    └─LSTMCell (recursive): 2-10             [1, 256]                  536,576
│    └─Linear (recursive): 2-11               [1, 10]                   2,570
===============================================================================================
Total params: 11,846,986
Trainable params: 11,846,986
Non-trainable params: 0
Total mult-adds (G): 1.28
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 13.01
Params size (MB): 47.39
Estimated Total Size (MB): 60.60
===============================================================================================

What do you think?

TylerYep avatar Jul 20 '21 06:07 TylerYep

Also, it would be great if you could add a test case for this feature!

TylerYep avatar Jul 20 '21 06:07 TylerYep

Please remove this option from the "defaults" section. We shouldn't change the default behavior.

The current default behavior is to show the recursive rows. The flag "recursive" should be added in the defaults to keep the current behavior unchanged.

Also, I think the proposed output is quite confusing, since it looks like the DotProductAttention layers are all stacked together. Also, it would be great if you could add a test case for this feature!

I totally agree. I will update the code.

VinhLoiIT avatar Jul 20 '21 06:07 VinhLoiIT

I'm not sure about the representation of columns "input shapes" and "output shapes" of recursive layers. Should we hide them too?

VinhLoiIT avatar Jul 20 '21 06:07 VinhLoiIT

Please remove this option from the "defaults" section. We shouldn't change the default behavior.

Also, I think the proposed output is quite confusing, since it looks like the DotProductAttention layers are all stacked together. In my opinion the output should look like:

===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
ConvAttnRNN                                   --                        --
├─Sequential: 1-1                             [1, 512, 2, 8]            --
│    └─Conv2d: 2-1                            [1, 64, 32, 128]          9,408
│    └─BatchNorm2d: 2-2                       [1, 64, 32, 128]          128
│    └─ReLU: 2-3                              [1, 64, 32, 128]          --
│    └─MaxPool2d: 2-4                         [1, 64, 16, 64]           --
│    └─Sequential: 2-5                        [1, 64, 16, 64]           --
│    │    └─BasicBlock: 3-1                   [1, 64, 16, 64]           73,984
│    │    └─BasicBlock: 3-2                   [1, 64, 16, 64]           73,984
│    └─Sequential: 2-6                        [1, 128, 8, 32]           --
│    │    └─BasicBlock: 3-3                   [1, 128, 8, 32]           230,144
│    │    └─BasicBlock: 3-4                   [1, 128, 8, 32]           295,424
│    └─Sequential: 2-7                        [1, 256, 4, 16]           --
│    │    └─BasicBlock: 3-5                   [1, 256, 4, 16]           919,040
│    │    └─BasicBlock: 3-6                   [1, 256, 4, 16]           1,180,672
│    └─Sequential: 2-8                        [1, 512, 2, 8]            --
│    │    └─BasicBlock: 3-7                   [1, 512, 2, 8]            3,673,088
│    │    └─BasicBlock: 3-8                   [1, 512, 2, 8]            4,720,640
├─Conv2d: 1-2                                 [1, 256, 2, 8]            131,328
├─AttentionLSTMDecoder: 1                     --                        --
│    └─DotProductAttention (recursive): 2-9   [1, 1, 256]               --
│    └─LSTMCell (recursive): 2-10             [1, 256]                  536,576
│    └─Linear (recursive): 2-11               [1, 10]                   2,570
===============================================================================================
Total params: 11,846,986
Trainable params: 11,846,986
Non-trainable params: 0
Total mult-adds (G): 1.28
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 13.01
Params size (MB): 47.39
Estimated Total Size (MB): 60.60
===============================================================================================

What do you think?

After a while looking at the source code, I found this line where recursive layers are checked only when they have parameters. I guess that some non-parameters layers, such as nn.Sigmoid or nn.ReLU or nn.Softmax, etc. are not counted as recursive layers even though they are called many times.

An example of this is from ResNet50's Bottleneck model where the self.relu is called three times in the forward method. In other words, a layer is not marked as recursive if it has no parameter. In my case, the DotProductAttention module has no learning params since it is defined for convenience purposes. Therefore, the showing up on the table is expected.

VinhLoiIT avatar Jul 20 '21 12:07 VinhLoiIT

Let's keep the input shapes and column shapes for recursive layers.

Keeping the recursive layers that don't have parameters is not ideal, but okay for now since there is currently no existing behavior.

Last couple things before this is ready to merge:

  • Rename this setting to hide_recursive and remove it from the defaults
  • Add test case(s)

TylerYep avatar Aug 03 '21 05:08 TylerYep

Closing in favor of #174. Thank you for the work here and the feature suggestion!

TylerYep avatar Oct 08 '22 05:10 TylerYep