Function SD.datasets returns only one SDS of multiple SDS with the same name in different VGroups
Problem description
The member function SD.datasets of the class SD in `DS.py' returns only one SDS of multiple SDS with same name in different VGroups.
Example:
Example: I have a file with this layout of VGroups and SDS:
File: /
Group: /MDR
Group: /MDR/Earthshine
Group: /MDR/Earthshine/GEO_EARTH
Group: /MDR/Earthshine/GEO_EARTH/CENTRE
SDS: /MDR/Earthshine/GEO_EARTH/CENTRE/latitude
SDS: /MDR/Earthshine/GEO_EARTH/CENTRE/longitude
Group: /MDR/Earthshine/GEO_EARTH/CORNER
SDS: /MDR/Earthshine/GEO_EARTH/CORNER/latitude
SDS: /MDR/Earthshine/GEO_EARTH/CORNER/longitude
Inspecting this file with pyhdf.SD.datasets leads to:
>>> from pyhdf.SD import *
>>> from pprint import pprint
>>>
>>> filename = 'bug.hdf4'
>>> sd = SD(filename)
>>> dsets = sd.datasets()
>>> pprint(dsets)
{'latitude': (('DIM010-00', 'DIM010:000:012:002-01', 'DIM010:000:012:002-02'),
(1020, 4, 32),
6,
2),
'longitude': (('DIM010-00', 'DIM010:000:012:002-01', 'DIM010:000:012:002-02'),
(1020, 4, 32),
6,
3)}
>>>
Only one latitude and one longitude SDS is found.
The problem is, that the SD.datasets() function builds a dictionary with the name of the SDS as keys. If two SDS have the same name, the second one overwrites the first one.
Proposed solution:
Use a tuple with the dataset name and the dataset index number as a key: (dataset_name, dataset_index_number, ). This key would be always unique (because the index number is unique), and the returned dictionary will include all datasets. This approach leads in the example above to:
>>> pprint(dsets)
{('latitude', 0): (('DIM010-00', 'DIM010:000:012:003-01'), (1020, 32), 6, 0),
('latitude', 2): (('DIM010-00',
'DIM010:000:012:002-01',
'DIM010:000:012:002-02'),
(1020, 4, 32),
6,
2),
('longitude', 1): (('DIM010-00', 'DIM010:000:012:003-01'), (1020, 32), 6, 1),
('longitude', 3): (('DIM010-00',
'DIM010:000:012:002-01',
'DIM010:000:012:002-02'),
(1020, 4, 32),
6,
3)}
>>>
All 4 datasets are included.
Code for solution
Replace datasets() in class SD in file pyhdf/SD.py with this updated version:
def datasets(self):
"""Return a dictionary describing all the file datasets.
Args::
no argument
Returns::
Empty dictionary if no dataset is defined.
Otherwise, dictionary whose keys are a tuple of
- the file dataset names,
- the dataset index number.
Note: if several datasets share the same name (in different VGroups),
they will be all listed, each with its own index number.
Values are tuples describing the corresponding datasets.
Each tuple holds the following elements in order:
- tuple holding the names of the dimensions defining the
dataset coordinate axes
- tuple holding the dataset shape (dimension lengths);
if a dimension is unlimited, the reported length corresponds
to the dimension current length
- dataset type
- dataset index number
C library equivalent : no equivalent
"""
# Get number of datasets
nDs = self.info()[0]
# Inquire each var
res = {}
for n in range(nDs):
# Get dataset info.
v = self.select(n)
vName, vRank, vLen, vType, vAtt = v.info()
if vRank < 2: # need a sequence
vLen = [vLen]
# Get dimension info.
dimNames = []
dimLengths = []
for dimNum in range(vRank):
d = v.dim(dimNum)
dimNames.append(d.info()[0])
dimLengths.append(vLen[dimNum])
res[(vName, n,)] = (tuple(dimNames), tuple(dimLengths),
vType, n)
return res