geo icon indicating copy to clipboard operation
geo copied to clipboard

Add geo-traits crate

Open kylebarron opened this issue 1 year ago • 94 comments

  • [x] I agree to follow the project's code of conduct.
  • [ ] I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Updated version of #1019 that's clean against the latest main. Aside from having no extra diff from main, it also has:

  • Automatic iterators. See https://github.com/geoarrow/geoarrow-rs/pull/427. This adds a Sized required bound, but means that all sources get iterators out of the box with minimal required impls.

kylebarron avatar Feb 26 '24 04:02 kylebarron

In the medium term (in 2024) I would love to add 3d support to geoarrow, so I want to keep thinking about xyz geometry traits, but I'm pretty happy with these for 2d geometries. They've worked well so far in geoarrow.

kylebarron avatar Feb 27 '24 16:02 kylebarron

Thanks for keeping this up to date WRT to main @kylebarron!

I said in https://github.com/georust/geo/pull/1011 (and still believe) that having some examples of algorithms that use these trait definitions would be key to evaluating them.

Are there examples anywhere that utilize these traits? @frewsxcv - have you tried, or would you be willing to see how this code works in your project?

michaelkirk avatar Feb 27 '24 20:02 michaelkirk

Thanks for keeping this up to date WRT to main @kylebarron!

I said in #1011 (and still believe) that having some examples of algorithms that use these trait definitions would be key to evaluating them.

Are there examples anywhere that utilize these traits? @frewsxcv - have you tried, or would you be willing to see how this code works in your project?

Are the usages in geoarrow-rs sufficient? Or are you looking for something else? Also would love to hear why you find this a blocker to merging these in to the repository.

frewsxcv avatar Feb 27 '24 20:02 frewsxcv

In rgis I would like to be able to read geospatial data and perform operations on the underlying data without having to convert through intermediary representations like geo-types. geo-traits with geozero would make this a lot simpler.

frewsxcv avatar Feb 27 '24 20:02 frewsxcv

Examples of how they're used today in geoarrow-rs:

@JosiahParry has also implemented these traits in serde_esri, and I believe he's tested against geoarrow.

kylebarron avatar Feb 27 '24 20:02 kylebarron

Thanks for the examples @kylebarron! I'll take a look.

And relevant to @frewsxcv question:

Also would love to hear why you find this a blocker to merging these in to the repository.

I continue to think it's reasonable that code in main is intended to be useful, and that seeing code actually used in pursuit of the problem it purports to solve seems like a reasonable bar. I look forward to looking at @kylebarron's examples to that end.

geo-traits with geozero would make this a lot simpler.

I understand that's the theoretical goal, and a worthy one. I'm asking to see if this actually works towards that goal. Care to give it a shot? Is there something that keeps you from building against this branch, for example?

michaelkirk avatar Feb 27 '24 20:02 michaelkirk

@JosiahParry has also implemented these traits in serde_esri, and I believe he's tested against geoarrow.

@JosiahParry - can you elaborate on what these traits enable you to do? Presumably it's useful to you or you wouldn't have done it 😉

michaelkirk avatar Feb 27 '24 20:02 michaelkirk

Nope! There is nothing blocking me from building on-top off a branch.

The point of merging the code into main is to make it easier to iterate on, as I find working with remote git branches to be unnecessarily tedious and hard to reason about when I'm using other geo dependencies in my projects. Another point is to prevent staleness that happened often with the previous separate branch approach, which is a concern I had from the start.

frewsxcv avatar Feb 27 '24 20:02 frewsxcv

@michaelkirk Correct me if this is wrong, but it sounds like you are skeptical of the usefulness of these traits without actually saying that. Am I mind reading correctly? I feel like there's an unusual and surprising amount of resistance to merging this in.

frewsxcv avatar Feb 27 '24 20:02 frewsxcv

Correct me if this is wrong, but it sounds like you are skeptical of the usefulness of these traits without actually saying that. Am I mind reading correctly? I feel like there's an unusual and surprising amount of resistance to merging this in.

I feel a bit attacked! But it's true - I am skeptical of the traits code. Truly though, I am skeptical of all code. I am admittedly bit extra skeptical of the geo-traits code in that it's not just a single isolated algorithm, it's a cross cutting concern. I feel like I've said similar things before, but maybe not clearly enough. I appreciate @kylebarron patience with me.

An implementation of BoundingRect in terms of traits.

This implementation looks good to me! You're using the trait accessors to implement a useful algorithm and it doesn't seem overly ceremonious compared to the implementation on geo-types.

FrechetDistanceLineString / Area

Ugh, these ones on the other hand, seem a bit unfortunate. Is there a way to make these look more like the BoundingRect one? Or is it reasonable to assume that most algorithms implemented on geo-traits will need to have a separate trait definition (DistanceLineString, DistancePolygon, etc.) for each geometry type?

I haven't yet grokked why these implementations can't look more like the BoundingRect one - something about having a dedicated struct for the algorithm (i.e. struct BoundingRect) maybe? Can we do this with our other algorithms?

Also, am I reading it correctly that the "trait based" implementation for FrechetDistance is simply converting to a geo-types and then calling FrechetDistance on that? Like we're not actually implementing the algorithm using the trait here in any meaningful way - or am I misunderstanding? Is there something preventing a geo-types free implementation?


All the conversion API's seem reasonable at first blush, but can you connect the dots for me a little bit?

let line_string_1_wkb: &[u8] = "..."
let line_string_bbox = ???

let line_string_2_wkb: &[u8] = "..."
let frechet_distance = ???

michaelkirk avatar Feb 27 '24 21:02 michaelkirk

Or is it reasonable to assume that most algorithms implemented on geo-traits will need to have a separate trait definition (DistanceLineString, DistancePolygon, etc.) for each geometry type?

All algorithm traits from geo need a separate implementation in geoarrow because geo's traits always return scalar objects while geoarrow's traits need to return arrays or chunked arrays.

I would love to have only one algorithm trait, e.g. just Distance but as mentioned in https://github.com/georust/geo/issues/1113 I don't think it's possible to implement blanket traits for multiple traits without specialization. I think impl<G: LineStringTrait> Distance for G {} and impl<G: PolygonTrait> Distance for G {} will always fail without specialization? So my first thought was to separate the traits to be able to implement impl<G: PolygonTrait> DistancePolygon for G {}. But I'd love for there to be a better way 🙂.

Also, am I reading it correctly that the "trait based" implementation for FrechetDistance is simply converting to a geo-types and then calling FrechetDistance on that? Like we're not actually implementing the algorithm using the trait here in any meaningful way - or am I misunderstanding? Is there something preventing a geo-types free implementation?

In BoundingRect I'm reimplementing the core algorithm in a native way on the traits. I do this for BoundingRect because getting bboxes is common enough that removing overhead is good and it's quite easy to implement. In FrechetDistanceLineString I don't want to touch the core algorithm implementation. In general, geoarrow's role is to provide the glue between high-level APIs in JS and Python to core algorithms implemented in other crates, such as geo.

The copy to a geo object is short-term glue that allows the trait to be called either on geoarrow scalars (e.g. a positional reference in an array) or geo scalar objects. In the long run, the underlying FrechetDistance trait in geo could be implemented in terms of LineStringTrait and the geoarrow glue wouldn't need to copy the input into geo::LineString.


All the conversion API's seem reasonable at first blush, but can you connect the dots for me a little bit?

let line_string_1_wkb: &[u8] = "..."
let line_string_bbox = ???

let line_string_2_wkb: &[u8] = "..."
let frechet_distance = ???

I added a test for the first one in this example PR: https://github.com/geoarrow/geoarrow-rs/pull/542

// A builder for a columnar WKB arrays
let mut wkb_builder = WKBBuilder::<i32>::new();
// Add a geo polygon to the WKB array
// This uses geo-traits to serialize to WKB and adds the binary to the array
wkb_builder.push_polygon(Some(&p0()));

// Finish the builder, creating an array of logical length 1.
let wkb_arr = wkb_builder.finish();

// Access the WKB scalar at position 0
// This is a reference onto the array. At this point the WKB is just a "blob" with no other
// information.
let wkb_scalar = wkb_arr.value(0);

// This is a "parsed" WKB object. The [WKBGeometry] type is an enum over each geometry
// type. WKBGeometry itself implements GeometryTrait but we need to unpack this to a
// WKBPolygon to access the object that has the PolygonTrait impl
let wkb_object = wkb_scalar.to_wkb_object();

// This is a WKBPolygon. It's already been scanned to know where each ring starts and ends,
// so it's O(1) from this point to access any single coordinate.
let wkb_polygon = match wkb_object {
    WKBGeometry::Polygon(wkb_polygon) => wkb_polygon,
    _ => unreachable!(),
};

// Add this wkb object directly into the BoundingRect
let mut bounding_rect = BoundingRect::new();
bounding_rect.add_polygon(&wkb_polygon);

assert_eq!(bounding_rect.minx, -111.);
assert_eq!(bounding_rect.miny, 41.);
assert_eq!(bounding_rect.maxx, -104.);
assert_eq!(bounding_rect.maxy, 45.);

Run with

cargo test --lib -- test2::test_wkb_to_bbox --nocapture

kylebarron avatar Feb 27 '24 21:02 kylebarron

I haven't yet grokked why these implementations can't look more like the BoundingRect one - something about having a dedicated struct for the algorithm (i.e. struct BoundingRect) maybe? Can we do this with our other algorithms?

I think part of why these implementations look the way they do is because I want to have compile-time safety across geometry types. E.g. I don't want to have fn frechet_distance(array: GeometryArray) -> Float64Array because frechet distance is not implemented for every geometry type. In this way, I think I've adopted the same use of traits as geo itself has done. The trait impl describes what operations are allowed on different combinations of objects. But this idea, that the trait describes what is allowed, ends up significantly more complex when you want to handle broadcasting across scalars, arrays, and chunked arrays.

(The chunked array abstraction, where you might have 10 million geometries in 100 separate chunks where each of those chunks is a contiguous buffer holding 100,000 geometries, is quite useful and is currently my unit of "automatic parallelism". I.e. a rayon par_map is called over each chunk.)

But the largest area of complexity in geoarrow-rs is handling these combinations. I think it's useful for UX to be able to call an operation like Distance against either a geoarrow scalar or a geo scalar, and thus it's necessary either to copy my impls twice, which is a lot of duplicated code, or implement them against a shared blanket trait.

kylebarron avatar Feb 27 '24 21:02 kylebarron

I would quite like this! I think it would be really neat if the geo-type traits also had a new() method that would allow you to write algorithms entirely with trait methods.

Take this super derived example (i adjusted @kylebarron's geo-traits/src/line_string.rs file) https://gist.github.com/JosiahParry/f3469f8df4706a4d36f8a47db9ae3e09)


  trait StartEnd where Self: LineStringTrait {
      fn start_end(&self) -> Self::Owned {
          let start = self.coord(0).unwrap();
          let end = self.coord(self.num_coords() - 1).unwrap();
          Self::new(vec![start, end])
      }
  }

impl<T: CoordNum> StartEnd for LineString<T> {}

 #[test]
    fn test_start_end() {
        let coords = vec![
            Coord {x: 1.0, y: 2.0},
            Coord {x: 3.0, y: 4.0},
            Coord {x: 5.0, y: 6.0},
        ];

        let line_string = LineString::new(coords);
        let start_end = line_string.start_end();

        println!("{:?}", start_end);
        assert_eq!(start_end.num_coords(), 2);
    }
running 1 test
LineString([Coord { x: 1.0, y: 2.0 }, Coord { x: 5.0, y: 6.0 }])
test line_string::tests::test_start_end ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s

JosiahParry avatar Feb 29 '24 18:02 JosiahParry

I think that trait definition has the same downsides as mentioned elsewhere, where you'll need a different trait definition for each geometry type. (You'd need a StartEndPolygon that's implemented on PolygonTrait).

I also think that it's just extra unnecessary complexity, because I argue that the implementation of the algorithm should choose the most appropriate return type, as long as the geometry trait is also implemented on that return type. That means, for any geo-implemented algorithms, the internal, temporary data objects will always be geo objects, and so the lowest-cost return object will always be a geo object. Instead of always incurring the cost to convert back to the original trait representation, that conversion should be deferred to the called for when it's optimal for them. For many libraries, calling convert_to_my_repr(line_string.start_end()) where convert_to_my_repr is defined as

fn convert_to_my_repr(geom: &LineStringTrait) -> MyLineStringStruct {}

But for other libraries, especially in the context of libraries like geoarrow where you want to optimize the vectorized execution and not the scalar execution, it's much better to defer the data transformation until you have an array of geometries.

kylebarron avatar Feb 29 '24 19:02 kylebarron

I need something like this for Geodesy, but obviously only need the CoordTrait there, so my comments may be a bit narrow-sighted.

First, given your comments, @kylebarron, about wanting to extend towards 3D, I find the name of the x_y method unfortunate, and would prefer xy.

Both because for my TeX-eyes, x_y looks like x with subscript y, and because with 3D, 4D and/or Measure extensions, it will be too much of an eyecatcher to spot x_y_z_t_m (and its truncated siblings) in my source code. xy, xyz, xyzt, etc. would suffice.

Second, I find it reasonable to extract the CoordTrait in a separate crate, since coordinates and features (in the OGC Simple Features sense) are two very different things. Then the geo-traits crate could start by implementing CoordTrait for the Point feature, and build the world from there.

Also, if I understand correctly, most of the reservations expressed above would not apply to CoordTrait in itself, so it could probably be merged separately. Which (full disclosure) would be extremely useful for me, especially if somewhat extended.

Third, as a general remark of the Georust naming conventions x and y are rather unfortunate names for something that may most likely be longitudes or eastings in the former, latitudes or northings in the latter case.

Geodetically speaking, coordinates are ordered, and can be referred to as 1st, 2nd, 3rd and 4th, and as you never know what is the CRS specific convention regarding (N,E) vs (E,N), any kind of implied (mis)understanding is unfortunate. I suggest to let the trait implementer provide x(), y(), z(), t(), and let the trait autoprovide first(), second(), third(), fourth() (or preferably the other way round), and then just postulate that the naming is an internal convention, not to be confused with any external conventions.

Evidently, trying to extend this to the entire georust ecosystem would neither be valuable or reasonable - wherever the culture descends from an "earth-is-flat-as-my-monitor" computer graphics world, x and y are obviously the more useful convention. But in a CoordTrait it would make much sense to support a more geodetic world view - it wouldn't harm, and it would not make implementation harder or more verbose: It's just a few autoprovided aliases in the trait definition.

EDIT 2: On second thoughts, I actually think the Third item can be ignored: It doesn't matter how the "this is the internal convention" is communicated, and first()...fourth() is probably a unnecessary complication, that impedes communication.

Fourth, I have implemented an extended version of CoordTrait over at Geodesy, confined to the Coordinate module (with impls here), and the Geodesics functionality of the Ellipsoid implementation.

As you can see, I took the liberty to extend to 4D+M, but probably not in a very elegant way. Also, I need the xy_as_f64 etc. functionality (cf. geodesic_fwd) and geodesic_inv), but have probably implemented it in an idiotic, rather than idiomatic way, so any comments and/or improvements would be much appreciated.

I would find it awesome to be able to implement a tighter integration between Geodesy and the Georust universe. And a somewhat geodesy-aware CoordTrait would be a straightforward and sensible way of doing it.

I know, however, that I am stomping on your grass, but please consider my suggestions - and I would be very happy for any comments regarding how I am wrong, and how I could work further towards a Georust<->Geodesy bridge.

busstoptaktik avatar Mar 19 '24 18:03 busstoptaktik

I'm interested in multi-dimensional geometry support, and think that handling it in traits could be a good way to incrementally add support. But I think it would probably be ideal if the dimension could be a generic, so that you'd statically know what type of coordinate you have, and which type of operations could be done on it.

I'm happy to switch x_y to xy.

Then the geo-traits crate could start by implementing CoordTrait for the Point feature, and build the world from there.

geo has had some discussion about whether to merge Point and Coord, and so it's not 100% clear that we do want both PointTrait and CoordTrait (in this PR I implemented PointTrait on Coord and CoordTrait on Point for simplicity). In any case CoordTrait/PointTrait is intricately linked to the rest of the traits here. A LineStringTrait will yield objects that implement CoordTrait, a PolygonTrait will yield objects that implement LineStringTrait and so on. So it's not clear to me that stabilizing only CoordTrait is the best, if we have to modify that later to support geometries.

kylebarron avatar Mar 19 '24 22:03 kylebarron

But I think it would probably be ideal if the dimension could be a generic, so that you'd statically know what type of coordinate you have, and which type of operations could be done on it.

That would require a good deal more genericity than just the dimension.

In PROJ, we check the pipelines upon construction, such that the output type of step n fits with the input type of step n+1, but that's static features of each operator, not of the coordinate (the coordinate types are all just untagged unions), so the coordinate is supposed to fit the pipeline its fed into, while the pipeline is checked for internal consistency. And the internal consistency is still just at a very general level (any kind of projected coordinate will be accepted between a step producing kind A and a step expecting kind B)

That's why I entirely dropped the "different coordinate types" in Rust Geodesy, and only support different dimensionalities: It just doesn't make sense - coordinates are interpreted by operators which expect a specific input type, and the gamut of possible types is infinite.

No reason to try to make the type system reject having me feeding a utm zone 32 coordinate into a pipeline expecting some other type of projected coordinate (or even geographical or geocentric cartesians): The output becomes garbage, but its consistent garbage :-) and there are just too many kinds of parametrizations, each of which would lead to principally different gamuts for valid operations.

So it's a tough job!

busstoptaktik avatar Mar 19 '24 23:03 busstoptaktik

I'm interested in multi-dimensional geometry support, and think that handling it in traits could be a good way to incrementally add support. But I think it would probably be ideal if the dimension could be a generic

@kylebarron My current, and much improved version, takes off from a much less modified version of your original CoordTrait, but with support for up to 4 dimensions, and a M(easure). Is it something like this ↓ you intend? That would fit very well with my plans

pub trait CoordTrait {
    type T: CoordNum;
    const DIMENSION: usize;
    const MEASURE: bool;

    /// Accessors for the coordinate tuple components
    fn x(&self) -> Self::T;
    fn y(&self) -> Self::T;
    fn z(&self) -> Self::T;
    fn t(&self) -> Self::T;
    fn m(&self) -> Self::T;

    /// Returns a tuple that contains the two first components of the coord.
    fn x_y(&self) -> (Self::T, Self::T) {
        (self.x(), self.y())
    }
}

busstoptaktik avatar Mar 21 '24 00:03 busstoptaktik

I had been thinking more along the lines of the proposal in this PR, where the struct is generic over a CoordNum that can be NoValue.

In particular, when DIMENSION is 2, what should z and t return? T::default()? I was thinking it would be lovely if the CoordTrait could somehow be parameterized so that z and t didn't exist when DIMENSION was 2.

Or maybe the better way of looking at this is that z() would return NoValue. That I suppose would be the best way of linking your proposal and that PR.

kylebarron avatar Mar 21 '24 15:03 kylebarron

when DIMENSION is 2, what should z and t return?

Essentially, I believe it is up to the person implementing the trait for a concrete data type to decide: It's your trait, but their data!

In Geodesy, I use NaN for t and 0 for z, because that fits well with geodetical reasoning: If you have no height information, your coordinate is probably assumed to be placed directly on the ellipsoid, whereas if you have no time coordinate, but accidentally call a time dependent operation on the coordinate, you actually want the NaN value to spill out all over your coordinate, to indicate its invalidity ("stomping on it with the x-large NaN boots").

In a sense, NaN is the IEEE 754 equivalent of None, and we want to be reminded when doing something stupid.

busstoptaktik avatar Mar 21 '24 16:03 busstoptaktik

when DIMENSION is 2, what should z and t return?

Essentially, I believe it is up to the person implementing the trait for a concrete data type to decide: It's your trait, but their data!

In general, I disagree because I think it's important to have a well-specified data contract for these traits. If geo or geodesy or any other crate implements algorithms based on these traits, they need to know what they'll receive.

In Geodesy, I use NaN for t and 0 for z, because that fits well with geodetical reasoning

In this case I think it would be better to have z, t, and m return Option<Self::T> and then in geodesy you could use coord.t().unwrap_or(T::NaN) and coord.z().unwrap_or(0). That way the default data values are informed by the consumer and not the producer.

kylebarron avatar Mar 21 '24 17:03 kylebarron

whereas if you have no time coordinate, but accidentally call a time dependent operation on the coordinate, you actually want the NaN value to spill out all over your coordinate, to indicate its invalidity ("stomping on it with the x-large NaN boots").

I think the dream is that rather than NaN spilling out all over at runtime, you have the type system refuse to even compile the code, saying "hey dummy, you're trying to use a time-dependent algorithm with non-time coords".

Maybe in practice it's not always possible when you instantiate your coordinates to know a priori what dimensions they'll need to have, but it seems like it would be possible some of the time.

michaelkirk avatar Mar 21 '24 17:03 michaelkirk

I think the dream is that rather than NaN spilling out all over at runtime, you have the type system refuse to even compile the code, saying "hey dummy, you're trying to use a time-dependent algorithm with non-time coords".

💯

Maybe in practice it's not always possible when you instantiate your coordinates to know a priori what dimensions they'll need to have, but it seems like it would be possible some of the time.

In geoarrow this is where dynamic downcasting comes in, and where there's both geometry-specific types like LineStringArray but also the trait object Arc<dyn GeometryArrayTrait>. If a user already has a LineStringArray, they'll see only traits implemented on LineStringArray available to them. If they have an &dyn GeometryArrayTrait, then they can either downcast it themselves to a LineStringArray and catch that error, or pass the &dyn GeometryArrayTrait into an algorithm directly, which will fail if the algorithm isn't implemented for the underlying type.

kylebarron avatar Mar 21 '24 17:03 kylebarron

@kylebarron said:

In general, I disagree because I think it's important to have a well-specified data contract for these traits. If geo or geodesy or any other crate implements algorithms based on these traits, they need to know what they'll receive.

I, on the contrary, stipulate, that it is the implementer of the trait for a concrete type, who knows how best to adapt the concrete type to the (well documented, but not necessarily perfectly obtainable) expectations of the trait

In Geodesy, I provide implementations of the CoordinateSet trait for the built-in coordinate types, Coor2D, Coor32, Coor3D, Coor4D, providing z=0 and t=NaN, but (as you know) for user supplied data types, this is entirely user selectable

In this case I think it would be better to have z, t, and m return Option<Self::T> and then in geodesy you could use coord.t().unwrap_or(T::NaN) and coord.z().unwrap_or(0). That way the default data values are informed by the consumer and not the producer.

In that case, I believe all should be optional. Partially for consistency, partially to support "only height" or "only M" systems.

My unfortunate knack for premature optimization is also concerned by the fact that making things optional will introduce branching right in the inner coordinate loop. In general, it will probably be of minor influence, though

busstoptaktik avatar Mar 22 '24 11:03 busstoptaktik

@michaelkirk said

I think the dream is that rather than NaN spilling out all over at runtime, you have the type system refuse to even compile the code, saying "hey dummy, you're trying to use a time-dependent algorithm with non-time coords".

Maybe in practice it's not always possible when you instantiate your coordinates to know a priori what dimensions they'll need to have, but it seems like it would be possible some of the time.

As I also indicated wrt. PROJ pipelines in a previous message, that will be virtually impossible: You can never know what run-time pipelines a user may enter into the system, so compile-time checks will at best be spotty.

Also note that in both PROJ and Geodesy, the data flow model is such that conversions are done in place, so the actual memory location holding the coordinate, will change coordinate type between the steps of a transformation pipeline. So the only thing you can reasonably say about a coordinate tuple is that it is a collection of 0 or more numbers. Their interpretation is entirely up to the associated CRS.

While geoinformatics (and hence Geo) is, broadly speaking, a branch of mathematics, where a CRS is interpreted as a mathematical platonic ideal, geodesy (and hence Geodesy, and PROJ) is a branch of geophysics, and a CRS is an entirely empirical contraption, encompassing all the noise and messy realities of the physical world.

busstoptaktik avatar Mar 22 '24 11:03 busstoptaktik

My unfortunate knack for premature optimization is also concerned by the fact that making things optional will introduce branching right in the inner coordinate loop

I do worry about this, though I'd love to see some numbers/data on the ballpark overhead to expect here.

In general, I also wonder whether there exists a trait that matches up to the expectations of geo and geodesy. It seems to me that geo tries to follow the Simple Features spec as closely as possible, which I believe, for example, stipulates that x and y always exist. It seems that geodesy needs a fundamentally different model of coordinates?

I suggest to let the trait implementer provide x(), y(), z(), t(), and let the trait autoprovide first(), second(), third(), fourth() (or preferably the other way round), and then just postulate that the naming is an internal convention, not to be confused with any external conventions.

I've been coming around to this. Some people will want to use three-dimensional coords to mean XYM instead of XYZ, and maybe that's ok to allow in the trait? Maybe the trait should define structure more than intent?

I've been thinking more about additional dimensions in geoarrow-rs, because I want to potentially use geoarrow-rs as a dependency in some Python projects where 3D coordinates are allowed. I started outlining some thoughts but to support more dimensions I think a const generic for the structural size is the way to go. Then the interpretation of XYM vs XYZ and XYZM vs XYZT are a function of the associated metadata, not the array types.

kylebarron avatar Apr 03 '24 04:04 kylebarron

In particular, thinking about something like

pub trait CoordTrait<const DIM: usize> {
    type T: CoordNum;

    /// x component of this coord
    fn x(&self) -> Self::T;

    /// y component of this coord
    fn y(&self) -> Self::T;

    fn third_unchecked(&self) -> Self::T;

    fn third_opt(&self) -> Option<Self::T> {
        if DIM >= 3 {
            Some(self.third_unchecked())
        } else {
            None
        }
    }

    fn fourth_unchecked(&self) -> Self::T;

    fn fourth_opt(&self) -> Option<Self::T> {
        if DIM >= 4 {
            Some(self.fourth_unchecked())
        } else {
            None
        }
    }

    /// Returns a tuple that contains the x/horizontal & y/vertical component of the coord.
    fn xy(&self) -> (Self::T, Self::T) {
        (self.x(), self.y())
    }
}

then e.g. a LineStringTrait would become

pub trait LineStringTrait<const DIM: usize>: Sized {
    type T: CoordNum;
    type ItemType<'a>: 'a + CoordTrait<DIM, T = Self::T>
    where
        Self: 'a;

The unchecked variants allow for no-overhead access. I'm not sure if there's any way to make that smarter. Maybe having a method like

    fn array(&self) -> [Self::T; DIM];

makes sense. (Though I don't know when the compiler will catch an out of bounds access on that array).

kylebarron avatar Apr 03 '24 04:04 kylebarron

If we were to introduce a trait Coord, does that mean we could change our algorithms from being parameterized by the scalar type (i.e the type of X/Y, geo::GeoNum) to being parameterized by the trait Coord?

Something like:

trait Coord {
    type Scalar: GeoNum;

    fn x(&self) -> Self::Scalar;
    fn y(&self) -> Self::Scalar;
}

pub trait Area<C: Coord> {
    fn signed_area(&self) -> Coord::Scalar;
    fn unsigned_area(&self) -> Coord::Scalar;
}

frewsxcv avatar Apr 09 '24 15:04 frewsxcv

Won't you have to be generic over the Scalar type? Not all traits will be able to use GeoNum, right?

urschrei avatar Apr 09 '24 15:04 urschrei

Something like this: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=4ade74994f39a0e5fd995f2e237670e6

Won't you have to be generic over the Scalar type? Not all traits will be able to use GeoNum, right?

Does this code answer your question? The Coord trait is generic over the associated type Scalar, but the Scalar generic doesn't need to be specified in the algorithm trait implementation

frewsxcv avatar Apr 09 '24 15:04 frewsxcv