LasIO.jl
LasIO.jl copied to clipboard
API for returning scaled and offset point cloud
I was messing around and ran into a situation where I wanted to get the scaled and offset float point cloud to use with NearestNeighbors.jl
.
I can use the the convert
function provided by the package, but then I have an Array{Point{3,Float64},1}
and lose access to all the metadata and other convenience functions like classification()
.
using FileIO, LasIO
header, points = load("points.las")
using GeometryTypes
pc_scaled = map(x-> Base.convert(Point{3, Float64},x, header), points) #Array{Point{3,Float64},1}
The code above works fine for my toy problem, but I wanted to ask about design thoughts for adding this.
The ability to expose scaled and offset coords to users was discussed briefly in this LasIO issue.
After reading through src
a bit, it looks like it would at least require a new type at least since all the x
,y
,z
fields are ::Int
? Not sure if any of the work over in GeometryBasics
/GeometryTypes
can be leveraged to simplify anything.
Maybe related issue in PointCloudRasterizers? and I also know about GeoAcceleratedArrays.
There's also some mention of incorporating scaling/offset into the type at https://github.com/visr/LasIO.jl/issues/4. Ideally I think the las positions should includes the affine transformation in their type. You might be able to get away with something as "simple" (ok, at least short) as the following:
using StaticArrays, CoordinateTransformations, LinearAlgebra
struct LasPoint{Trans} <: StaticVector{3,Float64}
data::NTuple{3,Int32}
end
transform_of(p::LasPoint{Trans}) where Trans = Trans
function Base.getindex(p::LasPoint, i::Int)
transform_of(p)(SVector(p.data))[i]
end
example usage:
julia> trans = AffineMap(Diagonal(SA_F64[1 0 0; 0 2 0; 0 0 3]), SA_F64[10, 20, 30])
AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])
julia> p = LasPoint{trans}(1,2,3)
3-element LasPoint{AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])} with indices SOneTo(3):
11.0
24.0
39.0
The Julia compiler generates astonishingly good code with this version of getindex. For example, the first component of the diagonal scaling is 1
in this case, so the multiplication is optimized away!
julia> foo(p) = p[1]
foo (generic function with 1 method)
julia> @code_native debuginfo=:none foo(p)
.text
vcvtsi2sdl (%rdi), %xmm0, %xmm0
movabsq $140660719895864, %rax # imm = 0x7FEE203E4538
vaddsd (%rax), %xmm0, %xmm0
retq
nopw %cs:(%rax,%rax)
nopl (%rax)
To get the underlying Int32
components you always have reinterpret:
julia> points = LasPoint{trans}[(1,2,3), (4,5,6)]
2-element Array{LasPoint{AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])},1}:
[11.0, 24.0, 39.0]
[14.0, 30.0, 48.0]
julia> reinterpret(NTuple{3,Int32}, points)
2-element reinterpret(Tuple{Int32,Int32,Int32}, ::Array{LasPoint{AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])},1}):
(1, 2, 3)
(4, 5, 6)
Actually most of the above LasPoint
is completely generic and in principle could be defined in CoordinateTransformations
. The main difficulty for a fully generic version is the eltype of the point (it depends on both Trans
and the eltype of the wrapped data
in general and that would need to be explicitly written in the type signature which is slightly annoying.)
Thinking more about this, I suspect we don't need to go inventing new things because the pieces already exist in CoordinateTransformations
and MappedArrays
:
julia> points_int32 = SVector{3,Int32}[(1,2,3), (4,5,6)]
2-element Array{SArray{Tuple{3},Int32,1,3},1}:
[1, 2, 3]
[4, 5, 6]
julia> trans = AffineMap(Diagonal(SA_F64[1 0 0; 0 2 0; 0 0 3]), SA_F64[10, 20, 30])
AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])
julia> points = mappedarray(trans, points_int32)
2-element mappedarray(AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0]), ::Array{SArray{Tuple{3},Int32,1,3},1}) with eltype SArray{Tuple{3},Float64,1,3}:
[0.0, 0.0, 0.0]
[14.0, 30.0, 48.0]
MappedArrays
is just what you need: you can even assign into the array of fixed-point coordinates by supplying an inverse transformation:
julia> points = mappedarray(trans, (x->round.(Int32,x)) ∘ inv(trans), points_int32)
2-element mappedarray(AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0]), Base.var"#64#65"{var"#9#10",AffineMap{Diagonal{Float64,SArray{Tuple{3},Float64,1,3}},SArray{Tuple{3},Float64,1,3}}}(var"#9#10"(), AffineMap([1.0 0.0 0.0; 0.0 0.5 0.0; 0.0 0.0 0.3333333333333333], [-10.0, -10.0, -10.0])), ::Array{SArray{Tuple{3},Int32,1,3},1}) with eltype SArray{Tuple{3},Float64,1,3}:
[11.0, 24.0, 39.0]
[14.0, 30.0, 48.0]
julia> points[1] = SA_F64[0,0,0]
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
0.0
0.0
0.0
julia> points_int32
2-element Array{SArray{Tuple{3},Int32,1,3},1}:
[-10, -10, -10]
[4, 5, 6]
That's pretty cool and seems to consistent in spirit with some of the other work using AffineMaps
.
All the work in JuliaGeometry
seems to be using some form of StaticArray
under the hood for point-like things so that should all play nicely with what you've outlined above right?
My original thought was to modify the load
/open
functions to add a flag to let users specify if they want the Int
or Float
version when reading rather than read and then transform.
The thing I am still thinking about is trickiness between versions of LAS spec versions. Some discussion here
Is a LasPoint be something special or is it just a point with some extra metadata fields(dependent on version)? Is there value in some generic supertype like PointClouds.jl?
EDIT: played around with it over the weekend, here is an example using the test data from LasIO
using FileIO, LasIO, MappedArrays, StaticArrays, LinearAlgebra, CoordinateTransformations, GeometryTypes
lasfn = joinpath(dirname(pathof(LasIO)), "..", "test/libLAS_1.2.las")
header, points = load(lasfn)
translation = AffineMap(Diagonal(SA_F64[header.x_scale 0 0; 0 header.y_scale 0; 0 0 header.z_scale]), SA_F64[header.x_offset, header.y_offset, header.z_offset])
extract(p::LasIO.LasPoint) = [p.x, p.y, p.z]
pts_int = [extract(i) for i in points]
ma_pts = mappedarray(translation,pts_int)
#test to make sure it matches convert(points)
pc_scaled = map(x-> Base.convert(Point{3, Float64},x, header), points)
ma_pts == pc_scaled # true
I believe with MappedArrays
we can simply map RawLasPoint
(fictional name) to LasPoint
, and have LasPoint
contain a SVector{3,Float64}
in the correct geospatial coordinate system. I don't think any of the other fields will need to be mapped (?) Both RawLasPoint
and LasPoint
would share most metadata fields so those would be a simple copy and the compiler is quite good at optimizing away such copying code. We'd need to check efficiency.
So there's two rather different approaches to this stuff:
- Have the transformation included in the point type itself (https://github.com/visr/LasIO.jl/issues/34#issuecomment-617574054) and use a normal
Base.Array
for storage - Use an array wrapper liked
MappedArray
and have the transformation included with wrapper (https://github.com/visr/LasIO.jl/issues/34#issuecomment-618184418)
The MappedArray
is more flexible, because it allows you to have more complex transformations which are not immutable structs. Sure you can get away with putting the transformation in the point type if it's a simple scale and offset. But sometimes this is not the case, for example
- Complex geospatial reprojections via
Proj4.jl
which require talking to a C API and sometimes require large amounts of raster data - Survey georeferencing pipelines, where you may, for example, be taking points from static scanner coordinates, and pasting them onto the landscape via the flight path of an aircraft (then you need the whole flight path and the time field in the LasPoint is critically part of the transformation). Though granted LAS may not be an ideal source format in this case...
The MappedArray
is also easier to implement. So I'd say that as long as it's efficient in practice it may be the better option to return from load
by default.
I do believe that Images.jl got it completely correct when they used types like Array{ColorTypes.RGB{FixedPointNumbers.Normed{UInt8,8}}
to represent image channel data semantically as Real
numbers rather than integers. And we should do the same for geospatial data, even though the mechanism may be slightly different.
Is there value in some generic supertype like PointClouds.jl?
PointClouds.jl is quite abandoned now, but it's a good question.
For completely generic processing of point cloud data (not specific to LAS), I think presenting a tabular data structure of some form is the way to go. The various LAS formats are inherently an array-of-structs on disk, so it might be best to have a TypedTables.Table
-like thing which overlays an internal row storage mechanism rather than the column storage used in TypedTables
. Then have that implement the Tables.jl
interface in an efficient way so that people can write generic processing code which works on both things like TypedTables.Table
and on row-based LAS data.
I have seen a few packages in the python world take the "points as a table" stance and seems to work well (geopandas also seems to be getting a reasonable amount of attention/love recently?).
Based on what you describe, at what point does it become a task of essentially re-creating RoamesGeometry/ Roames ecosystem?
There seems to be some convergence in terms of design ideas:
- Points as table <->
LazIO
supports theTables
interface - Use of AcceleratedArrays for indexing <->
GeoAcceleratedArrays,jl
- SpatialGrids <->
PointCloudRasterizers
(personal favorite julia package...) - Others?
Also, thanks for engaging with me. It's nice to talk things out.
Based on what you describe, at what point does it become a task of essentially re-creating RoamesGeometry/ Roames ecosystem?
Interesting question though I'm not sure I know how to answer it.
I'd say that we didn't fully realized these ideas while I was still at Roames — by the stage we had enough experience to want the above design, we had a lot of ad-hoc processing code in C++ and other languages, plus systems/people dedicated to working on that. Julia point cloud infrastructure was a relative latecomer to this mix. So I'm not sure there's exactly anything to "recreate" here; pretty much all the generically applicable Roames tooling is openly available. The bits which work and are more maintained (StaticArrays, TypedTables, AcceleratedArrays, Displaz ......) can be used. The bits which are somewhat incomplete and unmaintained like PointClouds.jl never really got critical mass in the first place.
I haven't been closely followed developments in the geo/point cloud ecosystem for a while, so at this stage I'm not very qualified to know where the ecosystem is heading! But I do know that arrays of structs are a bit of a disappointing abstraction in general.
The various LAS formats are inherently an array-of-structs on disk, so it might be best to have a TypedTables.Table-like thing which overlays an internal row storage mechanism rather than the column storage used in TypedTables
Would something like that be able to be added to TypedTables
as it currently exists or would that be an entirely new API/package?
I clicked around for a few days looking for other rowwise projects and there doesn't seem to be much.
I was able to find RowTables that builds on normal DataFrames
but it doesn't seem to be type stable.
I also saw a few mentions to IndexedTables as a way to get good performance with row-wise. It would be cool if we could load straight to JuliaDB and have all the out of core stuff work out of the box
Yeah I found RowTables.jl as well, but it doesn't look applicable. I also asked on slack, but to no avail so I think you'd have to build a new table type. Probably in a new package, though TypedTables could possibly be a home for it if there's really a lot of commonality.
I'm a bit late to the party, but my 2 cents:
It seems that JuliaGeo is moving towards traits instead of (super)types. See https://github.com/yeesian/GeoInterfaceRFC.jl/. So I would vote against a new pointcloud supertype. I've dabbled with having special Pointcloud traits (has_intensity, etc), for https://github.com/Deltares/PointCloudFilters.jl (very much a WIP), as there are many different sources (LasIO, LazIO, random csv or HDF5). I'll be hacking away at this approach this summer (August mostly).
With regards to the scaling/offset, I hardcoded that in an old PR, see #13, both for reading/writing. But I very much like the mappedarray approach, although I'm still in doubt if I want that as one "point" column or as three separate columns. But that's a minor detail, approach seems fine. It's always great when you find, even with a small ecosystem, that the tooling already exists and can be easily applied.
I'm still in doubt if I want that as one "point" column or as three separate columns
In my experience I want it as a point column for work within julia, but for interop with other systems, you may need it as separate columns. (Essentially because other systems don't have powerful enough type systems to talk about points in their own right.)
So having a neat way to wrap several columns into one poin at input and unwrap them into several components for output is good.