NCDatasets.jl icon indicating copy to clipboard operation
NCDatasets.jl copied to clipboard

Enum Types

Open timhultberg opened this issue 4 years ago • 4 comments

I would like to use NCDatasets to create files using enumerated types. As far as I can see from the documentation and the code, this is not supported. Would it be possible to add it?

timhultberg avatar Sep 21 '21 12:09 timhultberg

You are right, enum types are currently not supported. It is certainly withing the scope and doable. It just takes a sometime to write and test the code. Here is some start to expose the low-level functions (https://github.com/Alexander-Barth/NCDatasets.jl/commit/f82c24afe92b327903beba6082b291b181382510).

I am wondering what should be the return type of the higher level function. Maybe a julia array of Symbols, or a CategoricalArray/PooledArrays/IndirectArrays... I am not so familiar with these array types.

Alexander-Barth avatar Sep 26 '21 19:09 Alexander-Barth

Cool, thanks. for now I need to write rather than read the enum type, but this is still very helpfull.

"I am wondering what should be the return type of the higher level function. Maybe a julia array of Symbols, or a CategoricalArray/PooledArrays/IndirectArrays... I am not so familiar with these array types." Not sure, have never used them, but I guess it should be possible to use Julias @enum types

timhultberg avatar Sep 27 '21 15:09 timhultberg

In NetCDF, an identifier (Clear in the example below) can appear in different enum types:

netcdf enum2 {
types:
  byte enum cloud_t {Clear = 0, Cumulonimbus = 1, Stratus = 2,
      Stratocumulus = 3, Cumulus = 4, Altostratus = 5, Nimbostratus = 6,
      Altocumulus = 7, Missing = 127} ;
  byte enum cloud2_t {Clear = 10, Cumulonimbus = 11} ;
dimensions:
        time = UNLIMITED ; // (5 currently)
variables:
        cloud_t primary_cloud(time) ;
                cloud_t primary_cloud:_FillValue = Missing ;
}

However, julia doesn't let me do that:

julia> @enum cloud_t Clear=0
julia> @enum cloud_t2 Clear=10
ERROR: invalid redefinition of constant Clear
Stacktrace:
 [1] top-level scope
   @ Enums.jl:198
 [2] top-level scope
   @ REPL[5]:1

Also julia keywords can be a problem:

@enum cloud_t3 end=10
ERROR: syntax: extra token "end" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

While julias @enum seem to be natural (after all they have the same name than NetCDF enums ;-) ), I am not sure if this is the best (or save) choice here.

I just check with python's netCDF4, and they are simply returning the numbers:

In [2]: import netCDF4
In [3]: ds = netCDF4.Dataset("enum.nc")

In [6]: ds["primary_cloud"][:]
Out[6]:
masked_array(data=[0, 2, 4, --, 1],
             mask=[False, False, False,  True, False],
       fill_value=127,
            dtype=int8)

In [7]: data = ds["primary_cloud"][:]

In [9]: data[0]
Out[9]: 0

In [10]: data[1]
Out[10]: 2

The same is true for python's xarray.

(For your information I updated test_enum.jl)

Alexander-Barth avatar Sep 29 '21 07:09 Alexander-Barth