Iteration Formatting: Support Zero-Padding in %T
Update: Just be more clear in the documentation that %T also stands for zero-padded integers. Also, %T must appear in iterationEncoding because this allows to find other iterations in other files/groups. (Inside a specific file the %T in iterationEncoding must not be replaced with an actual number!)
Many people do zero-padding for iterations, especially for iterationEncoding fileBased.
~~When planning iterationFormat for fileBased we assumed files are named something_150.h5 and not something_000150.h5.~~
~~This is currently ok to find the current iteration, e.g., via data_000%T.h5 for a file data_00010.h5-data_00099.h5 but does not allow to find more iterations in other files above 99 or below 10 generically.~~
~~Currently in iterationFormat the allowed element %T does not cut the padding and a syntax such as printf's %05d will be necessary (e.g., %05T).~~
Update: maybe it is also fine right now if %T stands for 000100 because atoi (string to int conversions) will likely be able to identify that as 100 if they are aware this might be padded with zeros. In that case we should just be more explicit what is allowed here and how to describe it.
Here is the way it is currently done in openPMD-viewer:
if filename[-3:] == '.h5' or filename[-5:] == '.hdf5':
# Extract the iteration, using regular expressions (regex)
regex_match = re.search('(\d+).h[df]*5', filename)
if regex_match is None:
print('Ill-formated HDF5 file: %s\n File names should end '
'with the iteration number, followed by ".h5"' %filename)
else:
iteration = int( regex_match.groups()[-1] )
Is that fine with you @ax3l ?
Yes absolutely, I think we just could be more clean in the documentation of it. I totally forgot that matching + str2int for zero-padding is no problem at all in most languages.
Regarding your code a tiny change to make it more general:
if iterationEncoding is fileBased your regex should be build from iterationFormat into the one you use right now.
read iterationFormat string -> replace in it %T with (\d+) -> compile regex -> match regex
fileBased and no %T found in iterationFormat -> ill formated file (one can still derive the current iteration but one is not able to find other iterations by filename; finding other iterations is an awesome feature)
fileBased and %T found in iterationFormat but no regex match in file name -> ill formated file
If iterationEncoding is groupBased the iteration needs to be derived from the items in /data/. iterationFormat reflects this in this case by being equal to basePath.
(Side note: since the basePath is fixed to /data/%T/ in version 1.X one can always derive the iteration from here, too. The only difference for now is that fileBased does not allow more than one time step per file and additionally requires an iteration encoding in the file name.)
I can fix this together with #166 by mentioning %T can be zero-padded in its definition. Maybe also adding a note to implementors where it us used.