simplification icon indicating copy to clipboard operation
simplification copied to clipboard

simplification using a %-like value

Open mattijn opened this issue 4 years ago • 5 comments

Thanks for your work on these simplification algorithms!

Line simplification can be done using the Douglas-Pecker or Visvalingam-Whyatt algorithm. They both operate through an epsilon/tolerance parameter. Depending on the algorithm it represent distance (DP) or area (VW). Therefor the required tolerance in each situation is hard to decide since it is implicitly depending on the projection as well (eg. meters or degrees).

Mapshaper provide the option to reduce the linestring by a percentage of points. It would be nice to have this possibility as well for these algorithms.

(you probably also like this blog: http://martinfleischmann.net/line-simplification-algorithms/ which incorporate your package).

mattijn avatar Apr 28 '20 18:04 mattijn

I second this; for our use case of simplifying time series data down to a given number of points, this would be an extremely useful option to have (either as percentage or absolute number of points). Currently we execute simplify_coords_vw_idx consecutively with increasing tolerance, and either do very many steps, or end up oversimplifying to a large degree.

I had a quick glance at the underlying code and believe it might be possible to "just" add an additional break criterion based on the length of the remaining heap, but to be fair I have no idea about Rust.

If it helps, I wouldn't mind if the implementation is approximate in either number of points or order of the removed points.

Either way, thank you for your work! :)

mariuswallraff avatar Apr 15 '24 18:04 mariuswallraff

I too have a similar requirement!

mayuri-kulkarni avatar May 01 '24 11:05 mayuri-kulkarni

For those landing here for a similar reason, it might interest you that while this feature would still be very useful, in our use case, running simplify_coords_vw_idx consecutively with increasing tolerance (and applying the index to filter out unneeded data points before the next call) is actually faster than calling the function just once with the final tolerance.

I assume this is because finding the triangles that need to be recomputed in the stack becomes easier when the stack is smaller, but that's just a guess.

mariuswallraff avatar May 01 '24 20:05 mariuswallraff

For those landing here for a similar reason, it might interest you that while this feature would still be very useful, in our use case, running simplify_coords_vw_idx consecutively with increasing tolerance (and applying the index to filter out unneeded data points before the next call) is actually faster than calling the function just once with the final tolerance.

I assume this is because finding the triangles that need to be recomputed in the stack becomes easier when the stack is smaller, but that's just a guess.

That's very interesting. I'll have to dig into it in geo and see whether we can verify that guess. Do you have some example parameters?

urschrei avatar May 01 '24 20:05 urschrei

Sure, here's a small example, although it's not as good as with our real-life data; below I attached a slide illustrating my previous results.

from timeit import Timer

import numpy as np
from simplification.cutil import simplify_coords_vw_idx

np.random.seed(1337)

a = np.ascontiguousarray(
    np.vstack(
        [
            np.arange(10000),
            np.concatenate([np.random.normal(0.0, 0.1, 5000), np.random.normal(1.0, 0.1, 5000)]),
        ]
    ).T
)

# target: less than or equal 100 points
max_points = 100

epsilon = 20.48
out = simplify_coords_vw_idx(a, epsilon)  # => 91 points

print(Timer("simplify_coords_vw_idx(a, epsilon)", globals=locals()).repeat(3, number=1000))
# [4.4360130999994, 4.323799300007522, 4.425885300006485]


def iteratively(input_array, epsilon, max_points):
    selected_idx = np.arange(0, len(input_array))
    while True:
        current_idx = simplify_coords_vw_idx(input_array, epsilon)
        selected_idx = selected_idx[current_idx]

        if not max_points or len(selected_idx) <= max_points:
            break

        input_array = input_array[current_idx]
        epsilon *= 2.0
    return selected_idx


print(Timer("iteratively(a, 20.48, max_points)", globals=locals()).repeat(3, number=1000))
# [4.509853900002781, 4.511391199979698, 4.359233400027733]
print(Timer("iteratively(a, 10.24, max_points)", globals=locals()).repeat(3, number=1000))
# [4.469542200007709, 4.420092500018654, 4.369932599976892]
print(Timer("iteratively(a, 5.12, max_points)", globals=locals()).repeat(3, number=1000))
# [4.414737000013702, 4.3983672999893315, 4.3968801999872085]
print(Timer("iteratively(a, 2.56, max_points)", globals=locals()).repeat(3, number=1000))
# [4.21051350000198, 4.002777400019113, 3.9561154000111856]
print(Timer("iteratively(a, 1.28, max_points)", globals=locals()).repeat(3, number=1000))
# [3.8792780000076164, 3.972585600015009, 4.081758899992565]
print(Timer("iteratively(a, 0.64, max_points)", globals=locals()).repeat(3, number=1000))
# [3.955509699997492, 3.804218100005528, 3.783603399991989]
print(Timer("iteratively(a, 0.32, max_points)", globals=locals()).repeat(3, number=1000))
# [3.7186095000070054, 4.0281457999954, 3.7554183999891393]
print(Timer("iteratively(a, 0.16, max_points)", globals=locals()).repeat(3, number=1000))
# [3.755888199986657, 3.7749703000008594, 3.7538082000100985]
print(Timer("iteratively(a, 0.08, max_points)", globals=locals()).repeat(3, number=1000))
# [3.7066072000015993, 3.6726426999957766, 3.708958800008986]
print(Timer("iteratively(a, 0.04, max_points)", globals=locals()).repeat(3, number=1000))
# [3.9454101999872364, 3.809387200017227, 3.8627665000094566]

print(len(out))
print(len(iteratively(a, 0.04, max_points)))
print()

image

mariuswallraff avatar May 02 '24 19:05 mariuswallraff