RAiDER icon indicating copy to clipboard operation
RAiDER copied to clipboard

Discussion about reprojecting rays vs weather model

Open Askaholic opened this issue 5 years ago • 4 comments

After analyzing the performance of delay calculation at query points, I found that currently the time is split fairly evenly between two tasks;

  1. Reproject the rays into weather model coordinates
  2. Interpolate the weather model at ray points

By replacing the scipy interpolator with the c++ implementation I was able to bring the proportion of time spent on the interpolation down to a negligible amount. ~~The results were not exactly the same due to compounding differences in the numbers returned by the interpolators, but they were consistent to about 3 decimal places~~. Edit: The results were exactly the same, I made the mistake of passing in the rays as (x, y, z) when the weather model data is in (y, x, z) format.

This leaves us with almost all of the time being taken by the reprojection step. There are basically 2 ways to solve this as well:

  1. Do the transformation faster e.g.
    • Parallelization (GPU)
    • C implementation (pyproj is written in cython)
  2. Transform fewer points by transforming the weather model into ray coordinates instead of the other way around

The first option is definitely the more difficult of the two. I have looked around a little bit, and as far as I can tell PROJ4 (and by extension pyproj) is basically the defacto library for projecting geospatial data. Writing some custom software like I did with the c++ interpolator is not really an option because its a lot more work to implement.

The second option is very promising in terms of performance benefit, but raises some questions of accuracy as it will likely have a significant impact on the computed delays. The performance benefits would include:

  1. Projection cost would be dependent only on the size of the weather model and not the number of query points. Right now, it increases with the number of query points.
  2. Even for relatively small numbers of query points, the number of weather model nodes is likely to be far less than the number of points on the rays. So the threshold for seeing a performance gain is quite low. For my test on 100x100 query points above, this would reduce the number of reprojected points down from about 40M to 1.5M (a speedup of about 96%).

The tradeoff is of course the question of accuracy, and here is where my knowledge of the problem becomes exhausted. How much do the results change if we reproject the weather model and then do the interpolation instead of interpolating in the weather model coordinates? I discussed this a little bit with @jlmaurer and I guess it may depend on the weather model as each weather model may have it's own projection. For instance HRRR uses a conic projection, so it may actually be the case that doing a linear interpolation through a conic space introduces more errors than if we were to reproject and do the linear interpolation over more of a flat space.

What are your thoughts @dbekaert and @leiyangleon? We can also discuss this at our next meeting.

Askaholic avatar Jul 30 '20 22:07 Askaholic

The y,x,z is very confusing, and i suspect you will not be the only one to make that error.

The choice of handling the model in the native space was done for a specific reason that it would avoid interpolation errors/artifacts. Yes we do currently interpolate the vertical to a regular grid because pressure levels are not at the same hight, but for pressure levels that is not needed. Once that regular space is defined you get the benefit of a regular grid interpolator. So we should about moving to EC frame, and regrading to a regular grid (cost of interpolation artifacts) to then run the proj only for the model nodes. What do you think @piyushrpt?

One item here is that the proj call over chunks is probably not very efficient. I recall @piyushrpt mentioning that you could make a large call at once. We also wrap over it in python while it seems to have C++ too. https://proj.org/development/reference/cpp/common.html?highlight=c#_CPPv4N5osgeo4proj6commonE

@piyushrpt will try to join our next meeting to brainstorm.

dbekaert avatar Jul 31 '20 03:07 dbekaert

The PROJ C/C++ API for bulk transformation can be found here: https://proj.org/development/reference/functions.html#c.proj_trans_array

pyproj.transform itself can handle multiple points at the same time

For interpolation with C++ - there are numerous options - C++ bindings to fitpack (Scipy interpolation is mostly fitpack), Eigen (also used in isce3) etc

piyushrpt avatar Jul 31 '20 03:07 piyushrpt

@piyushrpt @jlmaurer @Askaholic dump of thoughts here:

RAY-tracing in EC or in local weather model grid

Current

  1. weather model is in its own projection, and we make a linear interpolator
  2. long lat grid is provided in own projection
  3. we calculated ray on grid points and sample it every 10 m in EC coordinate frame
  4. we convert ray in EC to weather model projection
  5. we interpolate the weather model parameters for the ray
  6. we integrate along the ray

How linear would the ray be in the local model reference frame? If pretty much linear, could not save us a lot of time by just doing the integration in the local model frame? Suggested

  1. weather model is in its own projection, and we make a linear interpolator
  2. long lat grid is provided in own projection
  3. we calculated ray on grid points in EC coordinate frame
  4. Transfer them to the local weather model grid and discretize the ray here.
  5. Interpolate the ray in the local model grid
  6. integrate along the ray

RAY-tracing at weather model grid nodes:

There is no need for us to do the ray-tracing at the user high resolution specified sampling, we can just make a cube at the model grid nodes.

  1. We should do the ray-tracing at the native water model grid node locations and at on average the weather model grid heights. We should hard-code the level in the code.
  2. Once the cube is calculated we finish with interpolation to the user provided grid. This would be similar to the NISAR approach for delays being stored.

LOS versus orbit file.

To have the ray-tracing at weather model nodes to be working one needs to have the orbit information. This you do not directly have when the user provided an LOS file. However, could we not first reverse engineer this to get a state vector such we again can compute the delay at native weather model grid and various height levels? @piyushrpt ?

Ray-tracing start and end-point

The only reason why we need to compute the ray start and end-point at high posting is because of DEM superposition. The ray start and end-points on fixed heigh levels are smoothly varying. Likely we could just span the domain of the user (i.e. the bounds) and compute the ray start and end point at a coarse (10 km etc TB investigated) multi-level (fixed height) sampling. Once that is done it could be translated back to the native grid model.

In fact if we go ahead and do the ray-tracing first at the native model and fixes heights, we could just inherit the height levels, but likely we could make it on a fixed lateral sampling (e.g. 10 km or even courses TBD). This would avoid models like HRRR to at once scale up to 3 km for ray start end determination which is overkill given its too smooth.

dbekaert avatar Jul 31 '20 20:07 dbekaert

@Askaholic @jlmaurer could you inspect how linear the rays when projecting them from EC back to the local weahter model projection? Would suggest you try a variety of weather model grid projection (MERRA/GMAO WGS84+lonlat at EPSG4326 ; HRRR lambert conformal conic, ECMWF HRES gaussian grid etc.) and use a certain state vector for the orbit.

dbekaert avatar Jul 31 '20 20:07 dbekaert