simpa
simpa copied to clipboard
Add profile decorator to simpa utils
This is just a convenience wrapper around using different profilers - making a PR in case you also find it useful
- Allows
@profile
decorator to be added to functions - By default this decorator does nothing
- But if the SIMPA_PROFILE` environment variable is set to:
-
TIME
:line_profiler
is used for line-by-line run-time profiling -
MEMORY
:memory_profiler
is used for line-by-line RAM use profiling -
GPU_MEMORY
:pytorch_memlab
is used for line-by-line GPU RAM profiling
-
- Profiling output is written to the console when the script finishes
- For GPU_MEMORY, a summary of gpu memory use (torch.cuda.memory_summary()) is also written to the console
- Add these profiling dependencies to tool.poetry.group.profile.dependencies
Please check the following before creating the pull request (PR):
- [x] Did you run automatic tests?
- [ ] Did you run manual tests?
- [x] Is the code provided in the PR still backwards compatible to previous SIMPA versions?
List any specific code review questions
List any special testing requirements
Additional context (e.g. papers, documentation, blog posts, ...)
Provide issue / feature request fixed by this PR
Example of output after adding @profile
decorator to get_enclosed_indices
in utils/libraries/structure_library/EllipticalTubularStructure.py:
With SIMPA_PROFILE=TIME
:
Total time: 1.13564 s
File: /export/home/lkeegan/simpa/simpa/utils/libraries/structure_library/EllipticalTubularStructure.py
Function: get_enclosed_indices at line 54
Line # Hits Time Per Hit % Time Line Contents
==============================================================
54 @profile
55 def get_enclosed_indices(self):
56 4 8631.0 2157.8 0.0 start_mm, end_mm, radius_mm, eccentricity, partial_volume = self.params
57 4 914321.0 228580.2 0.1 start_mm = torch.tensor(start_mm, dtype=torch.float).to(self.torch_device)
58 4 116221.0 29055.2 0.0 end_mm = torch.tensor(end_mm, dtype=torch.float).to(self.torch_device)
59 4 122881.0 30720.2 0.0 radius_mm = torch.tensor(radius_mm, dtype=torch.float).to(self.torch_device)
60 4 92632.0 23158.0 0.0 eccentricity = torch.tensor(eccentricity, dtype=torch.float).to(self.torch_device)
61 4 96431.0 24107.8 0.0 partial_volume = torch.tensor(partial_volume, dtype=torch.float).to(self.torch_device)
62
63 4 263872.0 65968.0 0.0 start_voxels = start_mm / self.voxel_spacing
64 4 43171.0 10792.8 0.0 end_voxels = end_mm / self.voxel_spacing
65 4 39302.0 9825.5 0.0 radius_voxels = radius_mm / self.voxel_spacing
66
67 4 277034.0 69258.5 0.0 x, y, z = torch.meshgrid(torch.arange(self.volume_dimensions_voxels[0]).to(self.torch_device),
68 4 92551.0 23137.8 0.0 torch.arange(self.volume_dimensions_voxels[1]).to(self.torch_device),
69 4 90311.0 22577.8 0.0 torch.arange(self.volume_dimensions_voxels[2]).to(self.torch_device),
70 4 1770.0 442.5 0.0 indexing='ij')
71
72 4 313413.0 78353.2 0.0 x = x + 0.5
73 4 58640.0 14660.0 0.0 y = y + 0.5
74 4 46590.0 11647.5 0.0 z = z + 0.5
75
76 4 6628647.0 1657161.8 0.6 if partial_volume:
77 4 1250.0 312.5 0.0 radius_margin = 0.5
78 else:
79 radius_margin = 0.7071
80
81 4 357833.0 89458.2 0.0 target_vector = torch.subtract(torch.stack([x, y, z], axis=-1), start_voxels)
82 4 1930.0 482.5 0.0 if self.do_deformation:
83 # the deformation functional needs mm as inputs and returns the result in reverse indexing order...
84 4 8636566.0 2159141.5 0.8 deformation_values_mm = self.deformation_functional_mm(torch.arange(self.volume_dimensions_voxels[0]) *
85 4 1030.0 257.5 0.0 self.voxel_spacing,
86 4 35750.0 8937.5 0.0 torch.arange(self.volume_dimensions_voxels[1]) *
87 4 4641.0 1160.2 0.0 self.voxel_spacing).T
88 4 8921.0 2230.2 0.0 deformation_values_mm = deformation_values_mm.reshape(self.volume_dimensions_voxels[0],
89 4 1460.0 365.0 0.0 self.volume_dimensions_voxels[1], 1, 1)
90 8 23343404.0 2917925.5 2.1 deformation_values_mm = torch.tile(torch.from_numpy(deformation_values_mm).to(
91 8 4270.0 533.8 0.0 self.torch_device), (1, 1, self.volume_dimensions_voxels[2], 3))
92 4 5772348.0 1443087.0 0.5 target_vector = (target_vector + (deformation_values_mm / self.voxel_spacing)).float()
93 4 50350.0 12587.5 0.0 cylinder_vector = torch.subtract(end_voxels, start_voxels)
94
95 4 503735.0 125933.8 0.0 main_axis_length = radius_voxels/(1-eccentricity**2)**0.25
96 4 44072171.0 11018042.8 3.9 main_axis_vector = torch.tensor([cylinder_vector[1], -cylinder_vector[0], 0]).to(self.torch_device)
97 4 467274.0 116818.5 0.0 main_axis_vector = main_axis_vector/torch.linalg.norm(main_axis_vector) * main_axis_length
98
99 4 305813.0 76453.2 0.0 minor_axis_length = main_axis_length*torch.sqrt(1-eccentricity**2)
100 4 180841.0 45210.2 0.0 minor_axis_vector = torch.cross(cylinder_vector, main_axis_vector)
101 4 129381.0 32345.2 0.0 minor_axis_vector = minor_axis_vector / torch.linalg.norm(minor_axis_vector) * minor_axis_length
102
103 4 435438607.0 108859651.8 38.3 dot_product = torch.matmul(target_vector, cylinder_vector)/torch.linalg.norm(cylinder_vector)
104
105 4 284523.0 71130.8 0.0 target_vector_projection = torch.multiply(dot_product[:, :, :, None], cylinder_vector)
106 4 59810.0 14952.5 0.0 target_vector_from_projection = target_vector - target_vector_projection
107
108 4 204562.0 51140.5 0.0 main_projection = torch.matmul(target_vector_from_projection, main_axis_vector) / main_axis_length
109
110 4 131242.0 32810.5 0.0 minor_projection = torch.matmul(target_vector_from_projection, minor_axis_vector) / minor_axis_length
111
112 4 253364.0 63341.0 0.0 radius_crit = torch.sqrt(((main_projection/main_axis_length)**2 + (minor_projection/minor_axis_length)**2) *
113 4 38501.0 9625.2 0.0 radius_voxels**2)
114
115 4 417708728.0 104427182.0 36.8 volume_fractions = torch.zeros(tuple(self.volume_dimensions_voxels), dtype=torch.float).to(self.torch_device)
116 4 864170.0 216042.5 0.1 filled_mask = radius_crit <= radius_voxels - 1 + radius_margin
117 4 146841.0 36710.2 0.0 border_mask = (radius_crit > radius_voxels - 1 + radius_margin) & \
118 4 100772.0 25193.0 0.0 (radius_crit < radius_voxels + 2 * radius_margin)
119
120 4 261661.0 65415.2 0.0 volume_fractions[filled_mask] = 1
121 4 17686319.0 4421579.8 1.6 volume_fractions[border_mask] = 1 - (radius_crit - (radius_voxels - radius_margin))[border_mask]
122 4 151731.0 37932.8 0.0 volume_fractions[volume_fractions < 0] = 0
123 4 80371.0 20092.8 0.0 volume_fractions[volume_fractions < 0] = 0
124
125 4 12488797.0 3122199.2 1.1 if partial_volume:
126
127 4 91050.0 22762.5 0.0 mask = filled_mask | border_mask
128 else:
129 mask = filled_mask
130
131 4 156560720.0 39140180.0 13.8 return mask.cpu().numpy(), volume_fractions[mask].cpu().numpy()
With SIMPA_PROFILE=MEMORY
:
Filename: /export/home/lkeegan/simpa/simpa/utils/libraries/structure_library/EllipticalTubularStructure.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
54 3507.3 MiB 3507.3 MiB 1 @profile
55 def get_enclosed_indices(self):
56 3507.3 MiB 0.0 MiB 1 start_mm, end_mm, radius_mm, eccentricity, partial_volume = self.params
57 3507.3 MiB 0.0 MiB 1 start_mm = torch.tensor(start_mm, dtype=torch.float).to(self.torch_device)
58 3507.3 MiB 0.0 MiB 1 end_mm = torch.tensor(end_mm, dtype=torch.float).to(self.torch_device)
59 3507.3 MiB 0.0 MiB 1 radius_mm = torch.tensor(radius_mm, dtype=torch.float).to(self.torch_device)
60 3507.3 MiB 0.0 MiB 1 eccentricity = torch.tensor(eccentricity, dtype=torch.float).to(self.torch_device)
61 3507.3 MiB 0.0 MiB 1 partial_volume = torch.tensor(partial_volume, dtype=torch.float).to(self.torch_device)
62
63 3507.3 MiB 0.0 MiB 1 start_voxels = start_mm / self.voxel_spacing
64 3507.3 MiB 0.0 MiB 1 end_voxels = end_mm / self.voxel_spacing
65 3507.3 MiB 0.0 MiB 1 radius_voxels = radius_mm / self.voxel_spacing
66
67 3507.3 MiB 0.0 MiB 2 x, y, z = torch.meshgrid(torch.arange(self.volume_dimensions_voxels[0]).to(self.torch_device),
68 3507.3 MiB 0.0 MiB 1 torch.arange(self.volume_dimensions_voxels[1]).to(self.torch_device),
69 3507.3 MiB 0.0 MiB 1 torch.arange(self.volume_dimensions_voxels[2]).to(self.torch_device),
70 3507.3 MiB 0.0 MiB 1 indexing='ij')
71
72 3507.3 MiB 0.0 MiB 1 x = x + 0.5
73 3507.3 MiB 0.0 MiB 1 y = y + 0.5
74 3507.3 MiB 0.0 MiB 1 z = z + 0.5
75
76 3507.3 MiB 0.0 MiB 1 if partial_volume:
77 3507.3 MiB 0.0 MiB 1 radius_margin = 0.5
78 else:
79 radius_margin = 0.7071
80
81 3507.3 MiB 0.0 MiB 1 target_vector = torch.subtract(torch.stack([x, y, z], axis=-1), start_voxels)
82 3507.3 MiB 0.0 MiB 1 if self.do_deformation:
83 # the deformation functional needs mm as inputs and returns the result in reverse indexing order...
84 3507.3 MiB 0.0 MiB 4 deformation_values_mm = self.deformation_functional_mm(torch.arange(self.volume_dimensions_voxels[0]) *
85 3507.3 MiB 0.0 MiB 1 self.voxel_spacing,
86 3507.3 MiB 0.0 MiB 2 torch.arange(self.volume_dimensions_voxels[1]) *
87 3507.3 MiB 0.0 MiB 2 self.voxel_spacing).T
88 3507.3 MiB 0.0 MiB 2 deformation_values_mm = deformation_values_mm.reshape(self.volume_dimensions_voxels[0],
89 3507.3 MiB 0.0 MiB 1 self.volume_dimensions_voxels[1], 1, 1)
90 3507.7 MiB 0.4 MiB 3 deformation_values_mm = torch.tile(torch.from_numpy(deformation_values_mm).to(
91 3507.3 MiB 0.0 MiB 2 self.torch_device), (1, 1, self.volume_dimensions_voxels[2], 3))
92 3507.7 MiB 0.0 MiB 1 target_vector = (target_vector + (deformation_values_mm / self.voxel_spacing)).float()
93 3507.7 MiB 0.0 MiB 1 cylinder_vector = torch.subtract(end_voxels, start_voxels)
94
95 3507.7 MiB 0.0 MiB 1 main_axis_length = radius_voxels/(1-eccentricity**2)**0.25
96 3507.7 MiB 0.0 MiB 1 main_axis_vector = torch.tensor([cylinder_vector[1], -cylinder_vector[0], 0]).to(self.torch_device)
97 3507.7 MiB 0.0 MiB 1 main_axis_vector = main_axis_vector/torch.linalg.norm(main_axis_vector) * main_axis_length
98
99 3507.7 MiB 0.0 MiB 1 minor_axis_length = main_axis_length*torch.sqrt(1-eccentricity**2)
100 3507.7 MiB 0.0 MiB 1 minor_axis_vector = torch.cross(cylinder_vector, main_axis_vector)
101 3507.7 MiB 0.0 MiB 1 minor_axis_vector = minor_axis_vector / torch.linalg.norm(minor_axis_vector) * minor_axis_length
102
103 4379.6 MiB 871.9 MiB 1 dot_product = torch.matmul(target_vector, cylinder_vector)/torch.linalg.norm(cylinder_vector)
104
105 4379.6 MiB 0.0 MiB 1 target_vector_projection = torch.multiply(dot_product[:, :, :, None], cylinder_vector)
106 4379.6 MiB 0.0 MiB 1 target_vector_from_projection = target_vector - target_vector_projection
107
108 4379.6 MiB 0.0 MiB 1 main_projection = torch.matmul(target_vector_from_projection, main_axis_vector) / main_axis_length
109
110 4379.6 MiB 0.0 MiB 1 minor_projection = torch.matmul(target_vector_from_projection, minor_axis_vector) / minor_axis_length
111
112 4379.6 MiB 0.0 MiB 2 radius_crit = torch.sqrt(((main_projection/main_axis_length)**2 + (minor_projection/minor_axis_length)**2) *
113 4379.6 MiB 0.0 MiB 1 radius_voxels**2)
114
115 4380.0 MiB 0.4 MiB 1 volume_fractions = torch.zeros(tuple(self.volume_dimensions_voxels), dtype=torch.float).to(self.torch_device)
116 4380.0 MiB 0.0 MiB 1 filled_mask = radius_crit <= radius_voxels - 1 + radius_margin
117 4380.0 MiB 0.0 MiB 2 border_mask = (radius_crit > radius_voxels - 1 + radius_margin) & \
118 4380.0 MiB 0.0 MiB 1 (radius_crit < radius_voxels + 2 * radius_margin)
119
120 4380.0 MiB 0.0 MiB 1 volume_fractions[filled_mask] = 1
121 4380.0 MiB 0.0 MiB 1 volume_fractions[border_mask] = 1 - (radius_crit - (radius_voxels - radius_margin))[border_mask]
122 4380.0 MiB 0.0 MiB 1 volume_fractions[volume_fractions < 0] = 0
123 4380.0 MiB 0.0 MiB 1 volume_fractions[volume_fractions < 0] = 0
124
125 4380.0 MiB 0.0 MiB 1 if partial_volume:
126
127 4380.0 MiB 0.0 MiB 1 mask = filled_mask | border_mask
128 else:
129 mask = filled_mask
130
131 4470.8 MiB 90.7 MiB 1 return mask.cpu().numpy(), volume_fractions[mask].cpu().numpy()
With SIMPA_PROFILE=GPU_MEMORY
:
## EllipticalTubularStructure.get_enclosed_indices
active_bytes reserved_bytes line code
all all
peak peak
8.45G 11.73G 54 @profile
55 def get_enclosed_indices(self):
8.12M 11.73G 56 start_mm, end_mm, radius_mm, eccentricity, partial_volume = self.params
8.13M 11.73G 57 start_mm = torch.tensor(start_mm, dtype=torch.float).to(self.torch_device)
8.13M 11.73G 58 end_mm = torch.tensor(end_mm, dtype=torch.float).to(self.torch_device)
8.13M 11.73G 59 radius_mm = torch.tensor(radius_mm, dtype=torch.float).to(self.torch_device)
8.13M 11.73G 60 eccentricity = torch.tensor(eccentricity, dtype=torch.float).to(self.torch_device)
8.13M 11.73G 61 partial_volume = torch.tensor(partial_volume, dtype=torch.float).to(self.torch_device)
62
8.13M 11.73G 63 start_voxels = start_mm / self.voxel_spacing
8.13M 11.73G 64 end_voxels = end_mm / self.voxel_spacing
8.13M 11.73G 65 radius_voxels = radius_mm / self.voxel_spacing
66
8.14M 11.73G 67 x, y, z = torch.meshgrid(torch.arange(self.volume_dimensions_voxels[0]).to(self.torch_device),
8.14M 11.73G 68 torch.arange(self.volume_dimensions_voxels[1]).to(self.torch_device),
8.14M 11.73G 69 torch.arange(self.volume_dimensions_voxels[2]).to(self.torch_device),
8.14M 11.73G 70 indexing='ij')
71
372.06M 11.73G 72 x = x + 0.5
735.98M 11.73G 73 y = y + 0.5
1.07G 11.73G 74 z = z + 0.5
75
1.07G 11.73G 76 if partial_volume:
1.07G 11.73G 77 radius_margin = 0.5
78 else:
79 radius_margin = 0.7071
80
3.21G 11.73G 81 target_vector = torch.subtract(torch.stack([x, y, z], axis=-1), start_voxels)
2.14G 11.73G 82 if self.do_deformation:
83 # the deformation functional needs mm as inputs and returns the result in reverse indexing order...
2.14G 11.73G 84 deformation_values_mm = self.deformation_functional_mm(torch.arange(self.volume_dimensions_voxels[0]) *
2.14G 11.73G 85 self.voxel_spacing,
2.14G 11.73G 86 torch.arange(self.volume_dimensions_voxels[1]) *
2.14G 11.73G 87 self.voxel_spacing).T
2.14G 11.73G 88 deformation_values_mm = deformation_values_mm.reshape(self.volume_dimensions_voxels[0],
2.14G 11.73G 89 self.volume_dimensions_voxels[1], 1, 1)
4.27G 11.73G 90 deformation_values_mm = torch.tile(torch.from_numpy(deformation_values_mm).to(
2.14G 11.73G 91 self.torch_device), (1, 1, self.volume_dimensions_voxels[2], 3))
8.54G 11.73G 92 target_vector = (target_vector + (deformation_values_mm / self.voxel_spacing)).float()
4.27G 11.73G 93 cylinder_vector = torch.subtract(end_voxels, start_voxels)
94
4.27G 11.73G 95 main_axis_length = radius_voxels/(1-eccentricity**2)**0.25
4.27G 11.73G 96 main_axis_vector = torch.tensor([cylinder_vector[1], -cylinder_vector[0], 0]).to(self.torch_device)
4.27G 11.73G 97 main_axis_vector = main_axis_vector/torch.linalg.norm(main_axis_vector) * main_axis_length
98
4.27G 11.73G 99 minor_axis_length = main_axis_length*torch.sqrt(1-eccentricity**2)
4.27G 11.73G 100 minor_axis_vector = torch.cross(cylinder_vector, main_axis_vector)
4.27G 11.73G 101 minor_axis_vector = minor_axis_vector / torch.linalg.norm(minor_axis_vector) * minor_axis_length
102
4.98G 11.73G 103 dot_product = torch.matmul(target_vector, cylinder_vector)/torch.linalg.norm(cylinder_vector)
104
5.70G 11.73G 105 target_vector_projection = torch.multiply(dot_product[:, :, :, None], cylinder_vector)
6.76G 11.73G 106 target_vector_from_projection = target_vector - target_vector_projection
107
7.47G 11.73G 108 main_projection = torch.matmul(target_vector_from_projection, main_axis_vector) / main_axis_length
109
7.83G 11.73G 110 minor_projection = torch.matmul(target_vector_from_projection, minor_axis_vector) / minor_axis_length
111
8.54G 11.73G 112 radius_crit = torch.sqrt(((main_projection/main_axis_length)**2 + (minor_projection/minor_axis_length)**2) *
7.83G 11.73G 113 radius_voxels**2)
114
8.18G 11.73G 115 volume_fractions = torch.zeros(tuple(self.volume_dimensions_voxels), dtype=torch.float).to(self.torch_device)
8.27G 11.73G 116 filled_mask = radius_crit <= radius_voxels - 1 + radius_margin
8.54G 11.73G 117 border_mask = (radius_crit > radius_voxels - 1 + radius_margin) & \
8.45G 11.73G 118 (radius_crit < radius_voxels + 2 * radius_margin)
119
8.36G 11.73G 120 volume_fractions[filled_mask] = 1
8.72G 11.73G 121 volume_fractions[border_mask] = 1 - (radius_crit - (radius_voxels - radius_margin))[border_mask]
8.45G 11.73G 122 volume_fractions[volume_fractions < 0] = 0
8.45G 11.73G 123 volume_fractions[volume_fractions < 0] = 0
124
8.36G 11.73G 125 if partial_volume:
126
8.45G 11.73G 127 mask = filled_mask | border_mask
128 else:
129 mask = filled_mask
130
8.45G 11.73G 131 return mask.cpu().numpy(), volume_fractions[mask].cpu().numpy()
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 8320 KiB | 8652 MiB | 10072 MiB | 18716 MiB |
| from large pool | 8320 KiB | 8652 MiB | 10072 MiB | 18716 MiB |
| from small pool | 0 KiB | 0 MiB | 0 MiB | 0 MiB |
|---------------------------------------------------------------------------|
| Active memory | 8320 KiB | 8652 MiB | 10072 MiB | 18716 MiB |
| from large pool | 8320 KiB | 8652 MiB | 10072 MiB | 18716 MiB |
| from small pool | 0 KiB | 0 MiB | 0 MiB | 0 MiB |
|---------------------------------------------------------------------------|
| Requested memory | 8320 KiB | 8651 MiB | 10070 MiB | 18713 MiB |
| from large pool | 8320 KiB | 8651 MiB | 10070 MiB | 18713 MiB |
| from small pool | 0 KiB | 0 MiB | 0 MiB | 0 MiB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 2184 MiB | 12016 MiB | 0 B | 9832 MiB |
| from large pool | 2184 MiB | 12012 MiB | 0 B | 9828 MiB |
| from small pool | 0 MiB | 4 MiB | 0 B | 4 MiB |
|---------------------------------------------------------------------------|
| Non-releasable memory | 2175 MiB | 5907 MiB | 9893 MiB | 8895 MiB |
| from large pool | 2175 MiB | 5907 MiB | 9889 MiB | 8889 MiB |
| from small pool | 0 MiB | 1 MiB | 4 MiB | 6 MiB |
|---------------------------------------------------------------------------|
| Allocations | 1 | 29 | 54 | 82 |
| from large pool | 1 | 16 | 26 | 41 |
| from small pool | 0 | 13 | 28 | 41 |
|---------------------------------------------------------------------------|
| Active allocs | 1 | 29 | 54 | 82 |
| from large pool | 1 | 16 | 26 | 41 |
| from small pool | 0 | 13 | 28 | 41 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 1 | 8 | 0 | 7 |
| from large pool | 1 | 6 | 0 | 5 |
| from small pool | 0 | 2 | 0 | 2 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 2 | 8 | 18 | 21 |
| from large pool | 2 | 8 | 13 | 14 |
| from small pool | 0 | 3 | 5 | 7 |
|---------------------------------------------------------------------------|
| Oversize allocations | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Oversize GPU segments | 0 | 0 | 0 | 0 |
|===========================================================================|