Found GLM file that has a higher number of events then current dytpe (int16) can hold for flash_np['n_points'] in io/mimic_lma.py
There exists a GLM file (Goes18 01/12/2025 - 06:33:00) where the number of events (33601) is larger then a signed int16 (as n_points is set to here) can handle (max of 32,767).
LeviDPC$ ncdump OR_GLM-L2-LCFA_G18_s20250120633000_e20250120633200_c20250120633215.nc | head -n 8
netcdf OR_GLM-L2-LCFA_G18_s20250120633000_e20250120633200_c20250120633215 {
dimensions:
number_of_flashes = UNLIMITED ; // (325 currently)
number_of_groups = UNLIMITED ; // (29813 currently)
number_of_events = UNLIMITED ; // (33601 currently)
number_of_time_bounds = 2 ;
number_of_field_of_view_bounds = 2 ;
number_of_wavelength_bounds = 2 ;
When I try to grid that file I get this error:
File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/grid/make_grids.py", line 838, in grid_GLM_flashes
outputs = list(map(this_proc_each_grid, subgrids))
File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/grid/make_grids.py", line 907, in proc_each_grid
gridder.process_flashes(glm, **process_flash_kwargs_ij)
File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/grid/make_grids.py", line 397, in process_flashes
read_flashes(glm, self.framer, base_date=self.t_ref,
File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 111, in read_flashes
results = list(map(chunk_func,flash_chunks))
File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 215, in fast_fixed_grid_read_chunk
fake_lma = mimic_lma_dataset_lut(flash_data, base_date,
File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 970, in mimic_lma_dataset_lut
gr_flashes = _fake_lma_from_glm_lutgroups(flash_data, basedate)
File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 937, in _fake_lma_from_glm_lutgroups
flash_np['n_points'] = flash_data.number_of_events.shape[0]
OverflowError: Python integer 33601 out of bounds for int16
My commit fixes the issue by changing it to an unsigned int which should be good up to 65,535
To fix this specific issue I only needed to update n_events dtype for [flash_dtype_fixedgridlut] (https://github.com/deeplycloudy/glmtools/blob/master/glmtools/io/mimic_lma.py#L609) but I thought it would make sense to do it for n_events dtype for flash_dtype as well.
I'm happy to change that or switch the dytpe to signed int32 (if it needs to be signed).
One thing that may need to be considered with this is if these files go directly to AWIPS or to any other Java software as Java doesn't have unsigned types. I have no say in this library or the CSPP downstream project using it, but Java limitations were the first thing that came to mind when I saw this PR.
Does it actually affect a datatype in an output file, or is it just an internal counter? If the latter then that should eliminate the Java concern.
Having said that, I would suggest promotion to int32, to head off the same problem happening when there's a case that exceeds 65,535 counts.
Good point. They'd probably be translated to something like another AWIPS-compatible NetCDF file. I do know sometimes data file support is added directly to AWIPS, but maybe GLM is too complicated for that to have been done already (ex. groups and flashes), hence the CSPP/Satpy conversion steps.
Thanks for the input all, (And I didn't realize that Java didn't have unsigned types) I have gone ahead and updated the source branch for this pull request to be signed int32. The branch name is no longer accurate but I guess thats a lesson for me about branch names going forward :sweat_smile: