openPMD-viewer
openPMD-viewer copied to clipboard
Investigating Performance Regression in ADIOS BP for
@RemiLehe reported and shared a HDF5/ADIOS2 data set with me that is 30x slower for BP than H5 in t.iterate(ts.get_a0, pol='y')
. Also, it gets slower the more often it is called.
- [x] BP5 or BP4?
- [x] file-based encoding used for ... iterations
- [x] profile
- [x] test
bpls -D ...
: for decomposed data on disk- [ ]
openPMD-pipe
could be used to reorganize the data to be contiguous (in fact, its default) - [ ] we can also use to to make BP file-based series variable-based and HDF5 file-based ones group-based
- [ ]
- [ ] Use lazy parsing /
defer_iteration_parsing
https://openpmd-api.readthedocs.io/en/0.15.2/details/backendconfig.html when the user specifies check_all_files=False
X-ref #380
BP Data:
- WarpX RZ Data
- ADIOS-BP v2.8.3 generated
- BP4
- fileBased encoding, 300 outputs
- 30M each
- 2 subfiles each
- bpls -D for all files
-
bpls -D diag1/openpmd_000200.bp:
- 9-57 blocks per particle, many NULL (due to https://github.com/ECP-WarpX/WarpX/pull/3657)
- 24 blocks per field, none of them NULL
HDF5 Data:
- FBPIC (v0.25.0) RZ Data
- HDF5
Benchmark: reads & processes field data
from openpmd_viewer.addons import LpaDiagnostics
ts = LpaDiagnostics('diag1/')
a0 = ts.iterate(ts.get_a0, pol='y') # ~1 or 50 iterations processed per second (BP or H5)
Warnings on open for BP series:
Warning: File 297 has different openPMD parameters than the rest of the time series.
Warning: File 298 has different openPMD parameters than the rest of the time series.
Warning: File 299 has different openPMD parameters than the rest of the time series.
The access pattern is quite simple: 2-4 reads on RZ fields (full read) per step, which take 95-97% of the time of this
Definitely on the openPMD-api side of things. CC @guj and @franzpoeschel for us to dig deeper: benchmark.zip
from openpmd_viewer.addons import LpaDiagnostics
from openpmd_viewer.openpmd_timeseries.data_reader import io_reader
%load_ext line_profiler
BP4
ts_bp = LpaDiagnostics('diag1/')
Warning: File 297 has different openPMD parameters than the rest of the time series.
Warning: File 298 has different openPMD parameters than the rest of the time series.
Warning: File 299 has different openPMD parameters than the rest of the time series.
%lprun -f io_reader.read_field_circ -f io_reader.field_reader.get_data \
a0_bp = ts_bp.iterate(ts_bp.get_a0, pol='y') # ~50 per second
100%|██████████| 299/299 [04:28<00:00, 1.11it/s]
Timer unit: 1e-09 s
Total time: 268.797 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/field_reader.py
Function: read_field_circ at line 124
Line # Hits Time Per Hit % Time Line Contents
==============================================================
124 def read_field_circ( series, iteration, field_name, component_name,
125 slice_relative_position, slice_across, m=0, theta=0.,
126 max_resolution_3d=None ):
127 """
128 Extract a given field from a file in the openPMD format,
129 when the geometry is thetaMode
130
131 Parameters
132 ----------
133 series: openpmd_api.Series
134 An open, readable openPMD-api series object
135
136 iteration: integer
137 Iteration from which parameters should be extracted
138
139 field_name : string, optional
140 Which field to extract
141
142 component_name : string, optional
143 Which component of the field to extract
144
145 m : int or string, optional
146 The azimuthal mode to be extracted
147
148 theta : float or None
149 Angle of the plane of observation with respect to the x axis
150 If `theta` is not None, then this function returns a 2D array
151 corresponding to the plane of observation given by `theta` ;
152 otherwise it returns a full 3D Cartesian array
153
154 slice_across : list of str or None
155 Direction(s) across which the data should be sliced
156 Elements can be 'r' and/or 'z'
157 Returned array is reduced by 1 dimension per slicing.
158
159 slice_relative_position : list of float or None
160 Number(s) between -1 and 1 that indicate where to slice the data,
161 along the directions in `slice_across`
162 -1 : lower edge of the simulation box
163 0 : middle of the simulation box
164 1 : upper edge of the simulation box
165
166 max_resolution_3d : list of int or None
167 Maximum resolution that the 3D reconstruction of the field (when
168 `theta` is None) can have. The list should contain two values,
169 e.g. `[200, 100]`, indicating the maximum longitudinal and transverse
170 resolution, respectively. This is useful for performance reasons,
171 particularly for 3D visualization.
172
173 Returns
174 -------
175 A tuple with
176 F : a 3darray or 2darray containing the required field,
177 depending on whether `theta` is None or not
178 info : a FieldMetaInformation object
179 (contains information about the grid; see the corresponding docstring)
180 """
181 1200 13210083.0 11008.4 0.0 it = series.iterations[iteration]
182
183 # Extract the dataset and and corresponding group
184 1200 9517554.0 7931.3 0.0 field = it.meshes[field_name]
185 1200 3207477.0 2672.9 0.0 if field.scalar:
186 component = next(field.items())[1]
187 else:
188 1200 4451544.0 3709.6 0.0 component = field[component_name]
189
190 # Extract the metainformation
191 # FIXME here and in h5py reader, we need to invert the order on 'F' for
192 # grid spacing/offset/position
193
194 1200 8484570.0 7070.5 0.0 coord_labels = {ii: coord for (ii, coord) in enumerate(field.axis_labels)}
195 1200 3730629.0 3108.9 0.0 coord_label_str = ''.join(coord_labels.values())
196 1200 734464.0 612.1 0.0 coord_label_str = 'm' + coord_label_str
197 1200 3710349.0 3092.0 0.0 coord_order = RZorder[coord_label_str]
198 1200 982238.0 818.5 0.0 if coord_order is RZorder.mrz:
199 Nm, Nr, Nz = component.shape
200 N_pair = (Nr, Nz)
201 1200 653775.0 544.8 0.0 elif coord_order is RZorder.mzr:
202 1200 2931781.0 2443.2 0.0 Nm, Nz, Nr = component.shape
203 1200 430916.0 359.1 0.0 N_pair = (Nz, Nr)
204 else:
205 raise Exception(order_error_msg)
206
207 # Nm, Nr, Nz = component.shape
208 2400 143492204.0 59788.4 0.1 info = FieldMetaInformation( coord_labels, N_pair,
209 1200 7396198.0 6163.5 0.0 field.grid_spacing, field.grid_global_offset,
210 1200 4925725.0 4104.8 0.0 field.grid_unit_SI, component.position, thetaMode=True )
211
212 # Convert to a 3D Cartesian array if theta is None
213 1200 740228.0 616.9 0.0 if theta is None:
214
215 # Get cylindrical info
216 rmax = info.rmax
217 inv_dr = 1./info.dr
218 Fcirc = get_data( series, component ) # (Extracts all modes)
219 if m == 'all':
220 modes = [ mode for mode in range(0, int(Nm / 2) + 1) ]
221 else:
222 modes = [ m ]
223 modes = np.array( modes, dtype='int' )
224 nmodes = len(modes)
225
226 # If necessary, reduce resolution of 3D reconstruction
227 if max_resolution_3d is not None:
228 max_res_lon, max_res_transv = max_resolution_3d
229 if Nz > max_res_lon:
230 # Calculate excess of elements along z
231 excess_z = int(np.round(Nz/max_res_lon))
232 # Preserve only one every excess_z elements
233 if coord_order is RZorder.mrz:
234 Fcirc = Fcirc[:, :, ::excess_z]
235 elif coord_order is RZorder.mzr:
236 Fcirc = Fcirc[:, ::excess_z, :]
237 else:
238 raise Exception(order_error_msg)
239 # Update info accordingly
240 info.z = info.z[::excess_z]
241 info.dz = info.z[1] - info.z[0]
242 if Nr > max_res_transv/2:
243 # Calculate excess of elements along r
244 excess_r = int(np.round(Nr/(max_res_transv/2)))
245 # Preserve only one every excess_r elements
246 if coord_order is RZorder.mrz:
247 Fcirc = Fcirc[:, ::excess_r, :]
248 elif coord_order is RZorder.mzr:
249 Fcirc = Fcirc[:, :, ::excess_r]
250 else:
251 raise Exception(order_error_msg)
252 # Update info and necessary parameters accordingly
253 info.r = info.r[::excess_r]
254 info.dr = info.r[1] - info.r[0]
255 inv_dr = 1./info.dr
256 # Update Nr after reducing radial resolution.
257 if coord_order is RZorder.mrz:
258 Nr = Fcirc.shape[1]
259 elif coord_order is RZorder.mzr:
260 Nr = Fcirc.shape[2]
261 else:
262 raise Exception(order_error_msg)
263
264 # Convert cylindrical data to Cartesian data
265 info._convert_cylindrical_to_3Dcartesian()
266 nx, ny, nz = len(info.x), len(info.y), len(info.z)
267 F_total = np.zeros( (nx, ny, nz) )
268 construct_3d_from_circ( F_total, Fcirc, info.x, info.y, modes,
269 nx, ny, nz, Nr, nmodes, inv_dr, rmax, coord_order)
270
271 else:
272
273 # Extract the modes and recombine them properly
274 1200 1020101.0 850.1 0.0 if coord_order is RZorder.mrz:
275 F_total = np.zeros( (2 * Nr, Nz ) )
276 1200 563083.0 469.2 0.0 elif coord_order is RZorder.mzr:
277 1200 127687513.0 106406.3 0.0 F_total = np.zeros( (Nz, 2 * Nr ) )
278 else:
279 raise Exception(order_error_msg)
280 1200 600205.0 500.2 0.0 if m == 'all':
281 # Sum of all the modes
282 # - Prepare the multiplier arrays
283 1200 480352.0 400.3 0.0 mult_above_axis = [1]
284 1200 313548.0 261.3 0.0 mult_below_axis = [1]
285 2400 2995535.0 1248.1 0.0 for mode in range(1, int(Nm / 2) + 1):
286 1200 6754595.0 5628.8 0.0 cos = np.cos( mode * theta )
287 1200 3444587.0 2870.5 0.0 sin = np.sin( mode * theta )
288 1200 801329.0 667.8 0.0 mult_above_axis += [cos, sin]
289 1200 2684201.0 2236.8 0.0 mult_below_axis += [ (-1) ** mode * cos, (-1) ** mode * sin ]
290 1200 2742807.0 2285.7 0.0 mult_above_axis = np.array( mult_above_axis )
291 1200 1751994.0 1460.0 0.0 mult_below_axis = np.array( mult_below_axis )
292 # - Sum the modes
293 1200 3e+11 2e+08 98.7 F = get_data( series, component ) # (Extracts all modes)
294 1200 3860942.0 3217.5 0.0 if coord_order is RZorder.mrz:
295 F_total[Nr:, :] = np.tensordot( mult_above_axis,
296 F, axes=(0, 0) )[:, :]
297 F_total[:Nr, :] = np.tensordot( mult_below_axis,
298 F, axes=(0, 0) )[::-1, :]
299 1200 609494.0 507.9 0.0 elif coord_order is RZorder.mzr:
300 3600 1540135643.0 427815.5 0.6 F_total[:, Nr:] = np.tensordot( mult_above_axis,
301 2400 1224560.0 510.2 0.0 F, axes=(0, 0) )[:, :]
302 3600 1408390398.0 391219.6 0.5 F_total[:, :Nr] = np.tensordot( mult_below_axis,
303 2400 1110211.0 462.6 0.0 F, axes=(0, 0) )[:, ::-1]
304 else:
305 raise Exception(order_error_msg)
306 elif m == 0:
307 # Extract mode 0
308 F = get_data( series, component, 0, 0 )
309 if coord_order is RZorder.mrz:
310 F_total[Nr:, :] = F[:, :]
311 F_total[:Nr, :] = F[::-1, :]
312 elif coord_order is RZorder.mzr:
313 F_total[:, Nr:] = F[:, :]
314 F_total[:, :Nr] = F[:, ::-1]
315 else:
316 raise Exception(order_error_msg)
317 else:
318 # Extract higher mode
319 cos = np.cos( m * theta )
320 sin = np.sin( m * theta )
321 F_cos = get_data( series, component, 2 * m - 1, 0 )
322 F_sin = get_data( series, component, 2 * m, 0 )
323 F = cos * F_cos + sin * F_sin
324 if coord_order is RZorder.mrz:
325 F_total[Nr:, :] = F[:, :]
326 F_total[:Nr, :] = (-1) ** m * F[::-1, :]
327 elif coord_order is RZorder.mzr:
328 F_total[:, Nr:] = F[:, :]
329 F_total[:, :Nr] = (-1) ** m * F[:, ::-1]
330 else:
331 raise Exception(order_error_msg)
332
333 # Perform slicing if needed
334 1200 580574.0 483.8 0.0 if slice_across is not None:
335 # Slice field and clear metadata
336 1200 6733740.0 5611.4 0.0 inverted_axes_dict = {info.axes[key]: key for key in info.axes.keys()}
337 2400 2808429.0 1170.2 0.0 for count, slice_across_item in enumerate(slice_across):
338 1200 471282.0 392.7 0.0 slicing_index = inverted_axes_dict[slice_across_item]
339 1200 1063420.0 886.2 0.0 coord_array = getattr( info, slice_across_item )
340 # Number of cells along the slicing direction
341 1200 863504.0 719.6 0.0 n_cells = len(coord_array)
342 # Index of the slice (prevent stepping out of the array)
343 1200 2489168.0 2074.3 0.0 i_cell = int( 0.5 * (slice_relative_position[count] + 1.) * n_cells )
344 1200 1481252.0 1234.4 0.0 i_cell = max( i_cell, 0 )
345 1200 1024311.0 853.6 0.0 i_cell = min( i_cell, n_cells - 1)
346 1200 36935450.0 30779.5 0.0 F_total = np.take( F_total, [i_cell], axis=slicing_index )
347 1200 6405975.0 5338.3 0.0 F_total = np.squeeze(F_total)
348 # Remove the sliced labels from the FieldMetaInformation
349 2400 938369.0 391.0 0.0 for slice_across_item in slice_across:
350 1200 19354570.0 16128.8 0.0 info._remove_axis(slice_across_item)
351
352 1200 328709.0 273.9 0.0 return F_total, info
Total time: 265.242 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py
Function: get_data at line 24
Line # Hits Time Per Hit % Time Line Contents
==============================================================
24 def get_data(series, record_component, i_slice=None, pos_slice=None,
25 output_type=None):
26 """
27 Extract the data from a (possibly constant) dataset
28 Slice the data according to the parameters i_slice and pos_slice
29
30 Parameters:
31 -----------
32 series: openpmd_api.Series
33 An open, readable openPMD-api series object
34
35 record_component: an openPMD.Record_Component
36
37 pos_slice: int or list of int, optional
38 Slice direction(s).
39 When None, no slicing is performed
40
41 i_slice: int or list of int, optional
42 Indices of slices to be taken.
43
44 output_type: a numpy type
45 The type to which the returned array should be converted
46
47 Returns:
48 --------
49 An np.ndarray (non-constant dataset) or a single double (constant dataset)
50 """
51 # For back-compatibility: Convert pos_slice and i_slice to
52 # single-element lists if they are not lists (e.g. float
53 # and int respectively).
54 1200 569220.0 474.4 0.0 if pos_slice is not None and not isinstance(pos_slice, list):
55 pos_slice = [pos_slice]
56 1200 395204.0 329.3 0.0 if i_slice is not None and not isinstance(i_slice, list):
57 i_slice = [i_slice]
58
59 # ADIOS2: Actual chunks, all other: one chunk
60 1200 3389494714.0 3e+06 1.3 chunks = record_component.available_chunks()
61
62 # read whole data set
63 1200 811767.0 676.5 0.0 if pos_slice is None:
64 # mask invalid regions with NaN
65 # note: full_like triggers a full read, thus we avoid it #340
66 1200 1979930988.0 2e+06 0.7 data = np.full(record_component.shape, np.nan, record_component.dtype)
67 30000 16305542.0 543.5 0.0 for chunk in chunks:
68 28800 313135011.0 10872.7 0.1 chunk_slice = chunk_to_slice(chunk)
69
70 # skip empty slices
71 # https://github.com/ornladios/ADIOS2
72 28800 8054948.0 279.7 0.0 volume = 1
73 115200 28072097.0 243.7 0.0 for csl in chunk_slice:
74 86400 40121819.0 464.4 0.0 volume *= csl.stop - csl.start
75 28800 9041033.0 313.9 0.0 if volume == 0:
76 continue
77
78 # read only valid region
79 28800 276187183.0 9589.8 0.1 x = record_component[chunk_slice]
80 28800 3e+11 9e+06 97.5 series.flush()
81 28800 631541838.0 21928.5 0.2 data[chunk_slice] = x
82 # slice: read only part of the data set
83 else:
84 full_shape = record_component.shape
85
86 slice_shape = list(full_shape) # copy
87 pos_slice_sorted = pos_slice.copy() # copy for in-place sort
88 pos_slice_sorted.sort(reverse=True)
89 for dir_index in pos_slice_sorted: # remove indices in list
90 del slice_shape[dir_index]
91
92 # mask invalid regions with NaN
93 data = np.full(slice_shape, np.nan, dtype=record_component.dtype)
94
95 # build requested ND slice with respect to full data
96 s = []
97 for d in range(len(full_shape)):
98 if d in pos_slice:
99 s.append(i_slice[pos_slice.index(d)]) # one index in such directions
100 else: # all indices in other direction
101 s.append(slice(None, None, None))
102 s = tuple(s)
103
104 # now we check which chunks contribute to the slice
105 for chunk in chunks:
106 skip_this_chunk = False
107 s_valid = list(s) # same as s but reduced to valid regions in chunk
108 s_target = [] # starts and stops in sliced array
109 chunk_slice = chunk_to_slice(chunk)
110
111 # skip empty slices
112 # https://github.com/ornladios/ADIOS2
113 volume = 1
114 for csl in chunk_slice:
115 volume *= csl.stop - csl.start
116 if volume == 0:
117 continue
118
119 # read only valid region
120 for d, slice_d in enumerate(s):
121 start = chunk_slice[d].start
122 stop = chunk_slice[d].stop
123 if isinstance(slice_d, int):
124 # Nothing to do for s_target (dimension sliced out)
125 # Nothing to do for s_valid (dimension index is set)
126 if slice_d < start or slice_d >= stop:
127 # chunk not in slice line/plane
128 skip_this_chunk = True
129 else:
130 if slice_d.start is None or slice_d.start < start:
131 s_valid[d] = slice(start, s_valid[d].stop)
132 if slice_d.stop is None or slice_d.stop > stop:
133 s_valid[d] = slice(s_valid[d].start, stop)
134 s_target.append(slice(start, stop))
135
136 s_valid = tuple(s_valid)
137 s_target = tuple(s_target)
138
139 # read
140 if not skip_this_chunk:
141 x = record_component[s_valid]
142 series.flush()
143 data[s_target] = x
144
145 # Convert to the right type
146 1200 641124.0 534.3 0.0 if (output_type is not None) and (data.dtype != output_type):
147 data = data.astype( output_type )
148 # Scale by the conversion factor
149 1200 5803104.0 4835.9 0.0 if record_component.unit_SI != 1.0:
150 if np.issubdtype(data.dtype, np.floating) or \
151 np.issubdtype(data.dtype, np.complexfloating):
152 data *= record_component.unit_SI
153 else:
154 data = data * record_component.unit_SI
155
156 1200 222451.0 185.4 0.0 return data
HDF5
ts_h5 = LpaDiagnostics('lab_diags/hdf5/')
%lprun -f io_reader.read_field_circ -f io_reader.field_reader.get_data \
a0_h5 = ts_h5.iterate(ts_h5.get_a0, pol='y') # ~50 per second
100%|██████████| 300/300 [00:41<00:00, 7.29it/s]
Timer unit: 1e-09 s
Total time: 40.4864 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/field_reader.py
Function: read_field_circ at line 124
Line # Hits Time Per Hit % Time Line Contents
==============================================================
124 def read_field_circ( series, iteration, field_name, component_name,
125 slice_relative_position, slice_across, m=0, theta=0.,
126 max_resolution_3d=None ):
127 """
128 Extract a given field from a file in the openPMD format,
129 when the geometry is thetaMode
130
131 Parameters
132 ----------
133 series: openpmd_api.Series
134 An open, readable openPMD-api series object
135
136 iteration: integer
137 Iteration from which parameters should be extracted
138
139 field_name : string, optional
140 Which field to extract
141
142 component_name : string, optional
143 Which component of the field to extract
144
145 m : int or string, optional
146 The azimuthal mode to be extracted
147
148 theta : float or None
149 Angle of the plane of observation with respect to the x axis
150 If `theta` is not None, then this function returns a 2D array
151 corresponding to the plane of observation given by `theta` ;
152 otherwise it returns a full 3D Cartesian array
153
154 slice_across : list of str or None
155 Direction(s) across which the data should be sliced
156 Elements can be 'r' and/or 'z'
157 Returned array is reduced by 1 dimension per slicing.
158
159 slice_relative_position : list of float or None
160 Number(s) between -1 and 1 that indicate where to slice the data,
161 along the directions in `slice_across`
162 -1 : lower edge of the simulation box
163 0 : middle of the simulation box
164 1 : upper edge of the simulation box
165
166 max_resolution_3d : list of int or None
167 Maximum resolution that the 3D reconstruction of the field (when
168 `theta` is None) can have. The list should contain two values,
169 e.g. `[200, 100]`, indicating the maximum longitudinal and transverse
170 resolution, respectively. This is useful for performance reasons,
171 particularly for 3D visualization.
172
173 Returns
174 -------
175 A tuple with
176 F : a 3darray or 2darray containing the required field,
177 depending on whether `theta` is None or not
178 info : a FieldMetaInformation object
179 (contains information about the grid; see the corresponding docstring)
180 """
181 1204 11854587.0 9846.0 0.0 it = series.iterations[iteration]
182
183 # Extract the dataset and and corresponding group
184 1204 7729223.0 6419.6 0.0 field = it.meshes[field_name]
185 1204 1966414.0 1633.2 0.0 if field.scalar:
186 component = next(field.items())[1]
187 else:
188 1204 4037189.0 3353.1 0.0 component = field[component_name]
189
190 # Extract the metainformation
191 # FIXME here and in h5py reader, we need to invert the order on 'F' for
192 # grid spacing/offset/position
193
194 1204 7395178.0 6142.2 0.0 coord_labels = {ii: coord for (ii, coord) in enumerate(field.axis_labels)}
195 1204 3127896.0 2597.9 0.0 coord_label_str = ''.join(coord_labels.values())
196 1204 658084.0 546.6 0.0 coord_label_str = 'm' + coord_label_str
197 1204 2862929.0 2377.8 0.0 coord_order = RZorder[coord_label_str]
198 1204 796927.0 661.9 0.0 if coord_order is RZorder.mrz:
199 1204 2759330.0 2291.8 0.0 Nm, Nr, Nz = component.shape
200 1204 503943.0 418.6 0.0 N_pair = (Nr, Nz)
201 elif coord_order is RZorder.mzr:
202 Nm, Nz, Nr = component.shape
203 N_pair = (Nz, Nr)
204 else:
205 raise Exception(order_error_msg)
206
207 # Nm, Nr, Nz = component.shape
208 2408 133323900.0 55367.1 0.3 info = FieldMetaInformation( coord_labels, N_pair,
209 1204 6074170.0 5045.0 0.0 field.grid_spacing, field.grid_global_offset,
210 1204 4416856.0 3668.5 0.0 field.grid_unit_SI, component.position, thetaMode=True )
211
212 # Convert to a 3D Cartesian array if theta is None
213 1204 716774.0 595.3 0.0 if theta is None:
214
215 # Get cylindrical info
216 rmax = info.rmax
217 inv_dr = 1./info.dr
218 Fcirc = get_data( series, component ) # (Extracts all modes)
219 if m == 'all':
220 modes = [ mode for mode in range(0, int(Nm / 2) + 1) ]
221 else:
222 modes = [ m ]
223 modes = np.array( modes, dtype='int' )
224 nmodes = len(modes)
225
226 # If necessary, reduce resolution of 3D reconstruction
227 if max_resolution_3d is not None:
228 max_res_lon, max_res_transv = max_resolution_3d
229 if Nz > max_res_lon:
230 # Calculate excess of elements along z
231 excess_z = int(np.round(Nz/max_res_lon))
232 # Preserve only one every excess_z elements
233 if coord_order is RZorder.mrz:
234 Fcirc = Fcirc[:, :, ::excess_z]
235 elif coord_order is RZorder.mzr:
236 Fcirc = Fcirc[:, ::excess_z, :]
237 else:
238 raise Exception(order_error_msg)
239 # Update info accordingly
240 info.z = info.z[::excess_z]
241 info.dz = info.z[1] - info.z[0]
242 if Nr > max_res_transv/2:
243 # Calculate excess of elements along r
244 excess_r = int(np.round(Nr/(max_res_transv/2)))
245 # Preserve only one every excess_r elements
246 if coord_order is RZorder.mrz:
247 Fcirc = Fcirc[:, ::excess_r, :]
248 elif coord_order is RZorder.mzr:
249 Fcirc = Fcirc[:, :, ::excess_r]
250 else:
251 raise Exception(order_error_msg)
252 # Update info and necessary parameters accordingly
253 info.r = info.r[::excess_r]
254 info.dr = info.r[1] - info.r[0]
255 inv_dr = 1./info.dr
256 # Update Nr after reducing radial resolution.
257 if coord_order is RZorder.mrz:
258 Nr = Fcirc.shape[1]
259 elif coord_order is RZorder.mzr:
260 Nr = Fcirc.shape[2]
261 else:
262 raise Exception(order_error_msg)
263
264 # Convert cylindrical data to Cartesian data
265 info._convert_cylindrical_to_3Dcartesian()
266 nx, ny, nz = len(info.x), len(info.y), len(info.z)
267 F_total = np.zeros( (nx, ny, nz) )
268 construct_3d_from_circ( F_total, Fcirc, info.x, info.y, modes,
269 nx, ny, nz, Nr, nmodes, inv_dr, rmax, coord_order)
270
271 else:
272
273 # Extract the modes and recombine them properly
274 1204 1033542.0 858.4 0.0 if coord_order is RZorder.mrz:
275 1204 116253591.0 96556.1 0.3 F_total = np.zeros( (2 * Nr, Nz ) )
276 elif coord_order is RZorder.mzr:
277 F_total = np.zeros( (Nz, 2 * Nr ) )
278 else:
279 raise Exception(order_error_msg)
280 1204 625240.0 519.3 0.0 if m == 'all':
281 # Sum of all the modes
282 # - Prepare the multiplier arrays
283 1204 501373.0 416.4 0.0 mult_above_axis = [1]
284 1204 337724.0 280.5 0.0 mult_below_axis = [1]
285 2408 2770012.0 1150.3 0.0 for mode in range(1, int(Nm / 2) + 1):
286 1204 6213459.0 5160.7 0.0 cos = np.cos( mode * theta )
287 1204 2939528.0 2441.5 0.0 sin = np.sin( mode * theta )
288 1204 797075.0 662.0 0.0 mult_above_axis += [cos, sin]
289 1204 2419221.0 2009.3 0.0 mult_below_axis += [ (-1) ** mode * cos, (-1) ** mode * sin ]
290 1204 2621466.0 2177.3 0.0 mult_above_axis = np.array( mult_above_axis )
291 1204 1843528.0 1531.2 0.0 mult_below_axis = np.array( mult_below_axis )
292 # - Sum the modes
293 1204 4e+10 3e+07 94.0 F = get_data( series, component ) # (Extracts all modes)
294 1204 2309270.0 1918.0 0.0 if coord_order is RZorder.mrz:
295 3612 1238568246.0 342903.7 3.1 F_total[Nr:, :] = np.tensordot( mult_above_axis,
296 2408 1154492.0 479.4 0.0 F, axes=(0, 0) )[:, :]
297 3612 823631746.0 228026.5 2.0 F_total[:Nr, :] = np.tensordot( mult_below_axis,
298 2408 1160084.0 481.8 0.0 F, axes=(0, 0) )[::-1, :]
299 elif coord_order is RZorder.mzr:
300 F_total[:, Nr:] = np.tensordot( mult_above_axis,
301 F, axes=(0, 0) )[:, :]
302 F_total[:, :Nr] = np.tensordot( mult_below_axis,
303 F, axes=(0, 0) )[:, ::-1]
304 else:
305 raise Exception(order_error_msg)
306 elif m == 0:
307 # Extract mode 0
308 F = get_data( series, component, 0, 0 )
309 if coord_order is RZorder.mrz:
310 F_total[Nr:, :] = F[:, :]
311 F_total[:Nr, :] = F[::-1, :]
312 elif coord_order is RZorder.mzr:
313 F_total[:, Nr:] = F[:, :]
314 F_total[:, :Nr] = F[:, ::-1]
315 else:
316 raise Exception(order_error_msg)
317 else:
318 # Extract higher mode
319 cos = np.cos( m * theta )
320 sin = np.sin( m * theta )
321 F_cos = get_data( series, component, 2 * m - 1, 0 )
322 F_sin = get_data( series, component, 2 * m, 0 )
323 F = cos * F_cos + sin * F_sin
324 if coord_order is RZorder.mrz:
325 F_total[Nr:, :] = F[:, :]
326 F_total[:Nr, :] = (-1) ** m * F[::-1, :]
327 elif coord_order is RZorder.mzr:
328 F_total[:, Nr:] = F[:, :]
329 F_total[:, :Nr] = (-1) ** m * F[:, ::-1]
330 else:
331 raise Exception(order_error_msg)
332
333 # Perform slicing if needed
334 1204 525321.0 436.3 0.0 if slice_across is not None:
335 # Slice field and clear metadata
336 1204 5131929.0 4262.4 0.0 inverted_axes_dict = {info.axes[key]: key for key in info.axes.keys()}
337 2408 1923462.0 798.8 0.0 for count, slice_across_item in enumerate(slice_across):
338 1204 495464.0 411.5 0.0 slicing_index = inverted_axes_dict[slice_across_item]
339 1204 782667.0 650.1 0.0 coord_array = getattr( info, slice_across_item )
340 # Number of cells along the slicing direction
341 1204 630311.0 523.5 0.0 n_cells = len(coord_array)
342 # Index of the slice (prevent stepping out of the array)
343 1204 1827510.0 1517.9 0.0 i_cell = int( 0.5 * (slice_relative_position[count] + 1.) * n_cells )
344 1204 1229624.0 1021.3 0.0 i_cell = max( i_cell, 0 )
345 1204 960371.0 797.7 0.0 i_cell = min( i_cell, n_cells - 1)
346 1204 14635558.0 12155.8 0.0 F_total = np.take( F_total, [i_cell], axis=slicing_index )
347 1204 4445069.0 3691.9 0.0 F_total = np.squeeze(F_total)
348 # Remove the sliced labels from the FieldMetaInformation
349 2408 904917.0 375.8 0.0 for slice_across_item in slice_across:
350 1204 15903423.0 13208.8 0.0 info._remove_axis(slice_across_item)
351
352 1204 349136.0 290.0 0.0 return F_total, info
Total time: 38.0203 s
File: /global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py
Function: get_data at line 24
Line # Hits Time Per Hit % Time Line Contents
==============================================================
24 def get_data(series, record_component, i_slice=None, pos_slice=None,
25 output_type=None):
26 """
27 Extract the data from a (possibly constant) dataset
28 Slice the data according to the parameters i_slice and pos_slice
29
30 Parameters:
31 -----------
32 series: openpmd_api.Series
33 An open, readable openPMD-api series object
34
35 record_component: an openPMD.Record_Component
36
37 pos_slice: int or list of int, optional
38 Slice direction(s).
39 When None, no slicing is performed
40
41 i_slice: int or list of int, optional
42 Indices of slices to be taken.
43
44 output_type: a numpy type
45 The type to which the returned array should be converted
46
47 Returns:
48 --------
49 An np.ndarray (non-constant dataset) or a single double (constant dataset)
50 """
51 # For back-compatibility: Convert pos_slice and i_slice to
52 # single-element lists if they are not lists (e.g. float
53 # and int respectively).
54 1204 569632.0 473.1 0.0 if pos_slice is not None and not isinstance(pos_slice, list):
55 pos_slice = [pos_slice]
56 1204 421564.0 350.1 0.0 if i_slice is not None and not isinstance(i_slice, list):
57 i_slice = [i_slice]
58
59 # ADIOS2: Actual chunks, all other: one chunk
60 1204 2e+10 2e+07 58.7 chunks = record_component.available_chunks()
61
62 # read whole data set
63 1204 929623.0 772.1 0.0 if pos_slice is None:
64 # mask invalid regions with NaN
65 # note: full_like triggers a full read, thus we avoid it #340
66 1204 1832455319.0 2e+06 4.8 data = np.full(record_component.shape, np.nan, record_component.dtype)
67 2408 2300463.0 955.3 0.0 for chunk in chunks:
68 1204 17648245.0 14658.0 0.0 chunk_slice = chunk_to_slice(chunk)
69
70 # skip empty slices
71 # https://github.com/ornladios/ADIOS2
72 1204 397967.0 330.5 0.0 volume = 1
73 4816 1636138.0 339.7 0.0 for csl in chunk_slice:
74 3612 2673375.0 740.1 0.0 volume *= csl.stop - csl.start
75 1204 561651.0 466.5 0.0 if volume == 0:
76 continue
77
78 # read only valid region
79 1204 44672184.0 37103.1 0.1 x = record_component[chunk_slice]
80 1204 1e+10 1e+07 35.3 series.flush()
81 1204 377283633.0 313358.5 1.0 data[chunk_slice] = x
82 # slice: read only part of the data set
83 else:
84 full_shape = record_component.shape
85
86 slice_shape = list(full_shape) # copy
87 pos_slice_sorted = pos_slice.copy() # copy for in-place sort
88 pos_slice_sorted.sort(reverse=True)
89 for dir_index in pos_slice_sorted: # remove indices in list
90 del slice_shape[dir_index]
91
92 # mask invalid regions with NaN
93 data = np.full(slice_shape, np.nan, dtype=record_component.dtype)
94
95 # build requested ND slice with respect to full data
96 s = []
97 for d in range(len(full_shape)):
98 if d in pos_slice:
99 s.append(i_slice[pos_slice.index(d)]) # one index in such directions
100 else: # all indices in other direction
101 s.append(slice(None, None, None))
102 s = tuple(s)
103
104 # now we check which chunks contribute to the slice
105 for chunk in chunks:
106 skip_this_chunk = False
107 s_valid = list(s) # same as s but reduced to valid regions in chunk
108 s_target = [] # starts and stops in sliced array
109 chunk_slice = chunk_to_slice(chunk)
110
111 # skip empty slices
112 # https://github.com/ornladios/ADIOS2
113 volume = 1
114 for csl in chunk_slice:
115 volume *= csl.stop - csl.start
116 if volume == 0:
117 continue
118
119 # read only valid region
120 for d, slice_d in enumerate(s):
121 start = chunk_slice[d].start
122 stop = chunk_slice[d].stop
123 if isinstance(slice_d, int):
124 # Nothing to do for s_target (dimension sliced out)
125 # Nothing to do for s_valid (dimension index is set)
126 if slice_d < start or slice_d >= stop:
127 # chunk not in slice line/plane
128 skip_this_chunk = True
129 else:
130 if slice_d.start is None or slice_d.start < start:
131 s_valid[d] = slice(start, s_valid[d].stop)
132 if slice_d.stop is None or slice_d.stop > stop:
133 s_valid[d] = slice(s_valid[d].start, stop)
134 s_target.append(slice(start, stop))
135
136 s_valid = tuple(s_valid)
137 s_target = tuple(s_target)
138
139 # read
140 if not skip_this_chunk:
141 x = record_component[s_valid]
142 series.flush()
143 data[s_target] = x
144
145 # Convert to the right type
146 1204 639774.0 531.4 0.0 if (output_type is not None) and (data.dtype != output_type):
147 data = data.astype( output_type )
148 # Scale by the conversion factor
149 1204 5938598.0 4932.4 0.0 if record_component.unit_SI != 1.0:
150 if np.issubdtype(data.dtype, np.floating) or \
151 np.issubdtype(data.dtype, np.complexfloating):
152 data *= record_component.unit_SI
153 else:
154 data = data * record_component.unit_SI
155
156 1204 331688.0 275.5 0.0 return data
BP4 Directly w/ openPMD-api
s = io.Series("diag1/openpmd_%T.bp", io.Access.read_only)
#io.list_series(s, True)
%%time
for k_i, i in s.iterations.items():
print("Iteration: {0}".format(k_i))
E = i.meshes["E"]
E_r = E["r"]
E_t = E["t"]
E_z = E["z"]
# emulate multiple reads
for i in range(4):
E_r_data = E_r[()]
s.flush()
E_t_data = E_t[()]
s.flush()
E_z_data = E_z[()]
s.flush()
Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Iteration: 10
Iteration: 11
Iteration: 12
Iteration: 13
Iteration: 14
Iteration: 15
Iteration: 16
Iteration: 17
Iteration: 18
Iteration: 19
Iteration: 20
Iteration: 21
Iteration: 22
Iteration: 23
Iteration: 24
Iteration: 25
Iteration: 26
Iteration: 27
Iteration: 28
Iteration: 29
Iteration: 30
Iteration: 31
Iteration: 32
Iteration: 33
Iteration: 34
Iteration: 35
Iteration: 36
Iteration: 37
Iteration: 38
Iteration: 39
Iteration: 40
Iteration: 41
Iteration: 42
Iteration: 43
Iteration: 44
Iteration: 45
Iteration: 46
Iteration: 47
Iteration: 48
Iteration: 49
Iteration: 50
Iteration: 51
Iteration: 52
Iteration: 53
Iteration: 54
Iteration: 55
Iteration: 56
Iteration: 57
Iteration: 58
Iteration: 59
Iteration: 60
Iteration: 61
Iteration: 62
Iteration: 63
Iteration: 64
Iteration: 65
Iteration: 66
Iteration: 67
Iteration: 68
Iteration: 69
Iteration: 70
Iteration: 71
Iteration: 72
Iteration: 73
Iteration: 74
Iteration: 75
Iteration: 76
Iteration: 77
Iteration: 78
Iteration: 79
Iteration: 80
Iteration: 81
Iteration: 82
Iteration: 83
Iteration: 84
Iteration: 85
Iteration: 86
Iteration: 87
Iteration: 88
Iteration: 89
Iteration: 90
Iteration: 91
Iteration: 92
Iteration: 93
Iteration: 94
Iteration: 95
Iteration: 96
Iteration: 97
Iteration: 98
Iteration: 99
Iteration: 100
Iteration: 101
Iteration: 102
Iteration: 103
Iteration: 104
Iteration: 105
Iteration: 106
Iteration: 107
Iteration: 108
Iteration: 109
Iteration: 110
Iteration: 111
Iteration: 112
Iteration: 113
Iteration: 114
Iteration: 115
Iteration: 116
Iteration: 117
Iteration: 118
Iteration: 119
Iteration: 120
Iteration: 121
Iteration: 122
Iteration: 123
Iteration: 124
Iteration: 125
Iteration: 126
Iteration: 127
Iteration: 128
Iteration: 129
Iteration: 130
Iteration: 131
Iteration: 132
Iteration: 133
Iteration: 134
Iteration: 135
Iteration: 136
Iteration: 137
Iteration: 138
Iteration: 139
Iteration: 140
Iteration: 141
Iteration: 142
Iteration: 143
Iteration: 144
Iteration: 145
Iteration: 146
Iteration: 147
Iteration: 148
Iteration: 149
Iteration: 150
Iteration: 151
Iteration: 152
Iteration: 153
Iteration: 154
Iteration: 155
Iteration: 156
Iteration: 157
Iteration: 158
Iteration: 159
Iteration: 160
Iteration: 161
Iteration: 162
Iteration: 163
Iteration: 164
Iteration: 165
Iteration: 166
Iteration: 167
Iteration: 168
Iteration: 169
Iteration: 170
Iteration: 171
Iteration: 172
Iteration: 173
Iteration: 174
Iteration: 175
Iteration: 176
Iteration: 177
Iteration: 178
Iteration: 179
Iteration: 180
Iteration: 181
Iteration: 182
Iteration: 183
Iteration: 184
Iteration: 185
Iteration: 186
Iteration: 187
Iteration: 188
Iteration: 189
Iteration: 190
Iteration: 191
Iteration: 192
Iteration: 193
Iteration: 194
Iteration: 195
Iteration: 196
Iteration: 197
Iteration: 198
Iteration: 199
Iteration: 200
Iteration: 201
Iteration: 202
Iteration: 203
Iteration: 204
Iteration: 205
Iteration: 206
Iteration: 207
Iteration: 208
Iteration: 209
Iteration: 210
Iteration: 211
Iteration: 212
Iteration: 213
Iteration: 214
Iteration: 215
Iteration: 216
Iteration: 217
Iteration: 218
Iteration: 219
Iteration: 220
Iteration: 221
Iteration: 222
Iteration: 223
Iteration: 224
Iteration: 225
Iteration: 226
Iteration: 227
Iteration: 228
Iteration: 229
Iteration: 230
Iteration: 231
Iteration: 232
Iteration: 233
Iteration: 234
Iteration: 235
Iteration: 236
Iteration: 237
Iteration: 238
Iteration: 239
Iteration: 240
Iteration: 241
Iteration: 242
Iteration: 243
Iteration: 244
Iteration: 245
Iteration: 246
Iteration: 247
Iteration: 248
Iteration: 249
Iteration: 250
Iteration: 251
Iteration: 252
Iteration: 253
Iteration: 254
Iteration: 255
Iteration: 256
Iteration: 257
Iteration: 258
Iteration: 259
Iteration: 260
Iteration: 261
Iteration: 262
Iteration: 263
Iteration: 264
Iteration: 265
Iteration: 266
Iteration: 267
Iteration: 268
Iteration: 269
Iteration: 270
Iteration: 271
Iteration: 272
Iteration: 273
Iteration: 274
Iteration: 275
Iteration: 276
Iteration: 277
Iteration: 278
Iteration: 279
Iteration: 280
Iteration: 281
Iteration: 282
Iteration: 283
Iteration: 284
Iteration: 285
Iteration: 286
Iteration: 287
Iteration: 288
Iteration: 289
Iteration: 290
Iteration: 291
Iteration: 292
Iteration: 293
Iteration: 294
Iteration: 295
Iteration: 296
Iteration: 297
Iteration: 298
Iteration: 299
CPU times: user 13.3 s, sys: 3.37 s, total: 16.6 s
Wall time: 1min 12s
HDF5 Directly w/ openPMD-api
s_h5 = io.Series("lab_diags/hdf5/data%T.h5", io.Access.read_only)
%%time
for k_i, i in s_h5.iterations.items():
print("Iteration: {0}".format(k_i))
E = i.meshes["E"]
E_r = E["r"]
E_t = E["t"]
E_z = E["z"]
# emulate multiple reads
for i in range(4):
E_r_data = E_r[()]
s_h5.flush()
E_t_data = E_t[()]
s_h5.flush()
E_z_data = E_z[()]
s_h5.flush()
Iteration: 0
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Iteration: 6
Iteration: 7
Iteration: 8
Iteration: 9
Iteration: 10
Iteration: 11
Iteration: 12
Iteration: 13
Iteration: 14
Iteration: 15
Iteration: 16
Iteration: 17
Iteration: 18
Iteration: 19
Iteration: 20
Iteration: 21
Iteration: 22
Iteration: 23
Iteration: 24
Iteration: 25
Iteration: 26
Iteration: 27
Iteration: 28
Iteration: 29
Iteration: 30
Iteration: 31
Iteration: 32
Iteration: 33
Iteration: 34
Iteration: 35
Iteration: 36
Iteration: 37
Iteration: 38
Iteration: 39
Iteration: 40
Iteration: 41
Iteration: 42
Iteration: 43
Iteration: 44
Iteration: 45
Iteration: 46
Iteration: 47
Iteration: 48
Iteration: 49
Iteration: 50
Iteration: 51
Iteration: 52
Iteration: 53
Iteration: 54
Iteration: 55
Iteration: 56
Iteration: 57
Iteration: 58
Iteration: 59
Iteration: 60
Iteration: 61
Iteration: 62
Iteration: 63
Iteration: 64
Iteration: 65
Iteration: 66
Iteration: 67
Iteration: 68
Iteration: 69
Iteration: 70
Iteration: 71
Iteration: 72
Iteration: 73
Iteration: 74
Iteration: 75
Iteration: 76
Iteration: 77
Iteration: 78
Iteration: 79
Iteration: 80
Iteration: 81
Iteration: 82
Iteration: 83
Iteration: 84
Iteration: 85
Iteration: 86
Iteration: 87
Iteration: 88
Iteration: 89
Iteration: 90
Iteration: 91
Iteration: 92
Iteration: 93
Iteration: 94
Iteration: 95
Iteration: 96
Iteration: 97
Iteration: 98
Iteration: 99
Iteration: 100
Iteration: 101
Iteration: 102
Iteration: 103
Iteration: 104
Iteration: 105
Iteration: 106
Iteration: 107
Iteration: 108
Iteration: 109
Iteration: 110
Iteration: 111
Iteration: 112
Iteration: 113
Iteration: 114
Iteration: 115
Iteration: 116
Iteration: 117
Iteration: 118
Iteration: 119
Iteration: 120
Iteration: 121
Iteration: 122
Iteration: 123
Iteration: 124
Iteration: 125
Iteration: 126
Iteration: 127
Iteration: 128
Iteration: 129
Iteration: 130
Iteration: 131
Iteration: 132
Iteration: 133
Iteration: 134
Iteration: 135
Iteration: 136
Iteration: 137
Iteration: 138
Iteration: 139
Iteration: 140
Iteration: 141
Iteration: 142
Iteration: 143
Iteration: 144
Iteration: 145
Iteration: 146
Iteration: 147
Iteration: 148
Iteration: 149
Iteration: 150
Iteration: 151
Iteration: 152
Iteration: 153
Iteration: 154
Iteration: 155
Iteration: 156
Iteration: 157
Iteration: 158
Iteration: 159
Iteration: 160
Iteration: 161
Iteration: 162
Iteration: 163
Iteration: 164
Iteration: 165
Iteration: 166
Iteration: 167
Iteration: 168
Iteration: 169
Iteration: 170
Iteration: 171
Iteration: 172
Iteration: 173
Iteration: 174
Iteration: 175
Iteration: 176
Iteration: 177
Iteration: 178
Iteration: 179
Iteration: 180
Iteration: 181
Iteration: 182
Iteration: 183
Iteration: 184
Iteration: 185
Iteration: 186
Iteration: 187
Iteration: 188
Iteration: 189
Iteration: 190
Iteration: 191
Iteration: 192
Iteration: 193
Iteration: 194
Iteration: 195
Iteration: 196
Iteration: 197
Iteration: 198
Iteration: 199
Iteration: 200
Iteration: 201
Iteration: 202
Iteration: 203
Iteration: 204
Iteration: 205
Iteration: 206
Iteration: 207
Iteration: 208
Iteration: 209
Iteration: 210
Iteration: 211
Iteration: 212
Iteration: 213
Iteration: 214
Iteration: 215
Iteration: 216
Iteration: 217
Iteration: 218
Iteration: 219
Iteration: 220
Iteration: 221
Iteration: 222
Iteration: 223
Iteration: 224
Iteration: 225
Iteration: 226
Iteration: 227
Iteration: 228
Iteration: 229
Iteration: 230
Iteration: 231
Iteration: 232
Iteration: 233
Iteration: 234
Iteration: 235
Iteration: 236
Iteration: 237
Iteration: 238
Iteration: 239
Iteration: 240
Iteration: 241
Iteration: 242
Iteration: 243
Iteration: 244
Iteration: 245
Iteration: 246
Iteration: 247
Iteration: 248
Iteration: 249
Iteration: 250
Iteration: 251
Iteration: 252
Iteration: 253
Iteration: 254
Iteration: 255
Iteration: 256
Iteration: 257
Iteration: 258
Iteration: 259
Iteration: 260
Iteration: 261
Iteration: 262
Iteration: 263
Iteration: 264
Iteration: 265
Iteration: 266
Iteration: 267
Iteration: 268
Iteration: 269
Iteration: 270
Iteration: 271
Iteration: 272
Iteration: 273
Iteration: 274
Iteration: 275
Iteration: 276
Iteration: 277
Iteration: 278
Iteration: 279
Iteration: 280
Iteration: 281
Iteration: 282
Iteration: 283
Iteration: 284
Iteration: 285
Iteration: 286
Iteration: 287
Iteration: 288
Iteration: 289
Iteration: 290
Iteration: 291
Iteration: 292
Iteration: 293
Iteration: 294
Iteration: 295
Iteration: 296
Iteration: 297
Iteration: 298
Iteration: 299
Iteration: 300
CPU times: user 2.52 s, sys: 1.54 s, total: 4.06 s
Wall time: 21.9 s
Next: reorganize both HDF5 and ADIOS BP4 with openPMD-pipe
to be contiguous (group-based) and read again.
Thanks Axel. Is the file available somewhere?
@guj I can share the files on NERSC with you :+1:
Please consider also what I wrote here
My current suspicion is that the data was written from an application that uses load balancing. The increasing load times in ADIOS2 would then just "represent" the increased complexity in the data (would be confirmed by showing the output of bpls -D
as described in that comment)
Thanks @franzpoeschel ! I posted the output of the decomposition a bit further up in a comment: https://github.com/openPMD/openPMD-viewer/issues/400#issuecomment-1777821654
Thanks @franzpoeschel ! I posted the output of the decomposition a bit further up in a comment: #400 (comment)
Ah, I didn't see that. Still need to go over your results in detail. Does the number of blocks get larger from the first to the last iteration?
60 1204 2e+10 2e+07 58.7 chunks = record_component.available_chunks()
Is this really the HDF5 benchmark where you are measuring this? HDF5 uses a simple fallback implementation for this, while this same function in ADIO2 can have a serious performance impact.
Yes, above I measure the HDF5 files and the ADIOS2 files, sorted under the respective captions. Indeed, very large time spend in here for HDF5... Longer than flush even o.0
Does the number of blocks get larger from the first to the last iteration?
Good point, added a full overview here now.
Looking at the data that we benchmark here (RZ fields), the variable /data/*/fields/E/r
applies. The number of blocks are 24 in this variable and stay 24 for all steps that I see in the series.
I did some first test runs on Perlmutter. (Using for now openpmd-pipe
shipped with openPMD-api, not yet your script). Two things that I noticed:
- The performance at least on
/global/cfs
is very unpredictable. Often, calling the same script a second time will lead to a much better performance. The first invocation often got noticeably slower mid-run, once to the point of freezing. Can this whole issue be a caching effect? - The openPMD-api does not yet really optimize very well for data Series with a large number of iterations. I noticed a little performance bug: Around half the runtime is spent in checking if the openPMD hierarchy has been modified, even for Iterations that have been closed already.
I'll prepare a PR that fixes this performance bug, but this will only be useful for the openPMD-viewer if it actually uses Iteration::close()
. However, this bug is unlikely to be the reason for your performance issues, as it would affect HDF5 and ADIOS2 indiscriminately.
How to do a benchmark of compiled (non-Python) software on Perlmutter
I used google-perftools for finding this performance bug. It might be helpful to see your results.
Installation
Fortunately relatively simple since Perlmutter has all dependencies loaded without needing any additional modules loaded. Gperftools consists of two components.
- Download and unpack a release from https://github.com/gperftools/gperftools/releases/. Use CMake to build and optionally install, e.g.
mkdir build; cd build; cmake .. -DCMAKE_INSTALL_PREFIX="$HOME/.local"; make -j; make install;
. - For analyzing the raw profiles, you will also need
pprof
which is a Go program. Installation is a single command line call:go install github.com/google/pprof@latest
.pprof
is then installed under~/go/bin/pprof --help
.
Profiling an application
The Python script whose C++ portion I wanted to profile was:
openpmd-pipe --infile diag1/openpmd_%T.bp --outfile out.bp --outconfig 'adios2.engine.type = "nullcore"'
In order to profile this:
LD_PRELOAD=~/.local/lib64/libprofiler.so CPUPROFILE=raw.prof python `which openpmd-pipe` --infile diag1/openpmd_%T.bp --outfile out.bp --outconfig 'adios2.engine.type = "nullcore"'
This runs the application with a slight slowdown. Afterwards, a file named raw.prof
is on the disk. This needs to be analyzed with pprof
now. The most helpful visualization is via graphviz which I did not bother to install on Perlmutter. Instead, I used pprof
to output a .dot
file which I then viewed on my own computer with Xdot:
~/go/bin/pprof -tree `which python` raw.prof > prof.dot
Find below the example graph which shows some Python internals, but more importantly the openPMD-api and ADIOS2-internal calls.
Performance analysis by @guj:
- https://docs.google.com/presentation/d/1DnuzBg3KoFlDt_tpuQBoUA9e_BAO4SWml0H3Riu-rRQ/edit#slide=id.g2c1c2ec9ebc_0_19
- https://github.com/guj/misc
Action items:
- [x]
openPMD-api
: fix internal regressions https://github.com/openPMD/openPMD-api/pull/1598 - [ ]
openPMD-api
: enable to.close()
and reopen an iteration (in both random access and.readIteration
/linear read) https://github.com/openPMD/openPMD-api/issues/1606 - [ ]
openPMD-viewer
: close out steps (e.g., ints.iterate()
or after we opened N further steps in random access mode)