pyhdf icon indicating copy to clipboard operation
pyhdf copied to clipboard

Function SD.datasets returns only one SDS of multiple SDS with the same name in different VGroups

Open bramiup opened this issue 2 months ago • 0 comments

Problem description

The member function SD.datasets of the class SD in `DS.py' returns only one SDS of multiple SDS with same name in different VGroups.

Example:

Example: I have a file with this layout of VGroups and SDS:

File: /
 Group: /MDR
 Group: /MDR/Earthshine
 Group: /MDR/Earthshine/GEO_EARTH
 Group: /MDR/Earthshine/GEO_EARTH/CENTRE
     SDS: /MDR/Earthshine/GEO_EARTH/CENTRE/latitude
     SDS: /MDR/Earthshine/GEO_EARTH/CENTRE/longitude
 Group: /MDR/Earthshine/GEO_EARTH/CORNER
     SDS: /MDR/Earthshine/GEO_EARTH/CORNER/latitude
     SDS: /MDR/Earthshine/GEO_EARTH/CORNER/longitude

Inspecting this file with pyhdf.SD.datasets leads to:

>>> from pyhdf.SD  import *
>>> from pprint import pprint
>>> 
>>> filename = 'bug.hdf4'
>>> sd = SD(filename)
>>> dsets = sd.datasets()
>>> pprint(dsets)
{'latitude': (('DIM010-00', 'DIM010:000:012:002-01', 'DIM010:000:012:002-02'),
              (1020, 4, 32),
              6,
              2),
 'longitude': (('DIM010-00', 'DIM010:000:012:002-01', 'DIM010:000:012:002-02'),
               (1020, 4, 32),
               6,
               3)}
>>> 

Only one latitude and one longitude SDS is found.

The problem is, that the SD.datasets() function builds a dictionary with the name of the SDS as keys. If two SDS have the same name, the second one overwrites the first one.

Proposed solution:

Use a tuple with the dataset name and the dataset index number as a key: (dataset_name, dataset_index_number, ). This key would be always unique (because the index number is unique), and the returned dictionary will include all datasets. This approach leads in the example above to:

>>> pprint(dsets)
{('latitude', 0): (('DIM010-00', 'DIM010:000:012:003-01'), (1020, 32), 6, 0),
 ('latitude', 2): (('DIM010-00',
                    'DIM010:000:012:002-01',
                    'DIM010:000:012:002-02'),
                   (1020, 4, 32),
                   6,
                   2),
 ('longitude', 1): (('DIM010-00', 'DIM010:000:012:003-01'), (1020, 32), 6, 1),
 ('longitude', 3): (('DIM010-00',
                     'DIM010:000:012:002-01',
                     'DIM010:000:012:002-02'),
                    (1020, 4, 32),
                    6,
                    3)}
>>> 

All 4 datasets are included.

Code for solution

Replace datasets() in class SD in file pyhdf/SD.py with this updated version:

    def datasets(self):
        """Return a dictionary describing all the file datasets.                                              
                                                                                                              
        Args::                                                                                                
                                                                                                              
          no argument                                                                                         
                                                                                                              
        Returns::                                                                                             
                                                                                                              
          Empty dictionary if no dataset is defined.                                                          
          Otherwise, dictionary whose keys are a tuple of                                                     
            - the file dataset names,                                                                         
            - the dataset index number.                                                                       
          Note: if several datasets share the same name (in different VGroups),                               
                they will be all listed, each with its own index number.                                      
          Values are tuples describing the corresponding datasets.                                            
          Each tuple holds the following elements in order:                                                   
                                                                                                              
          - tuple holding the names of the dimensions defining the                                            
            dataset coordinate axes                                                                           
          - tuple holding the dataset shape (dimension lengths);                                              
            if a dimension is unlimited, the reported length corresponds                                      
            to the dimension current length                                                                   
          - dataset type                                                                                      
          - dataset index number                                                                              
                                                                                                              
        C library equivalent : no equivalent                                                                  
                                                """
        # Get number of datasets                                                                              
        nDs = self.info()[0]

        # Inquire each var                                                                                    
        res = {}
        for n in range(nDs):
            # Get dataset info.                                                                               
            v = self.select(n)
            vName, vRank, vLen, vType, vAtt = v.info()
            if vRank < 2:     # need a sequence                                                               
                vLen = [vLen]
            # Get dimension info.                                                                             
            dimNames = []
            dimLengths = []
            for dimNum in range(vRank):
                d = v.dim(dimNum)
                dimNames.append(d.info()[0])
                dimLengths.append(vLen[dimNum])
            res[(vName, n,)] = (tuple(dimNames), tuple(dimLengths),
                         vType, n)

        return res

bramiup avatar Nov 04 '25 10:11 bramiup