pcl
pcl copied to clipboard
[filters] Revise filters which might be out of date due to `functor_filter`
Is your feature request related to a problem? Please describe.
Some older implementations of filters (at least PassThrough and ConditionalRemoval) are very slow compared to what can be achieved by FunctorFilter (introduced by #3890)
Context
Source Code
#include <benchmark/benchmark.h>
#include <pcl/filters/conditional_removal.h>
#include <pcl/filters/functor_filter.h>
#include <pcl/filters/passthrough.h>
#include <pcl/point_cloud.h>
#include <pcl/point_types.h>
#include <pcl/types.h>
#include <functional>
#include <random>
struct PointCloudFixture : public benchmark::Fixture
{
void SetUp(const ::benchmark::State& state)
{
cloud = pcl::make_shared<pcl::PointCloud<pcl::PointXYZ>>();
cloud->resize(state.range(0));
for (auto& pt : *cloud) {
pt = pcl::PointXYZ{0., 0., 0.};
}
}
pcl::shared_ptr<pcl::PointCloud<pcl::PointXYZ>> cloud;
};
BENCHMARK_DEFINE_F(PointCloudFixture, FunctionTest)(benchmark::State& st)
{
const auto lambda = [](const pcl::PointCloud<pcl::PointXYZ>& cloud,
pcl::index_t idx) {
const auto pt = cloud[idx];
return (pt.getArray3fMap() < 10).all() &&
(pt.getArray3fMap() > -10).all();
};
const std::function<bool(const pcl::PointCloud<pcl::PointXYZ>&,
pcl::index_t)>
functor = lambda;
pcl::FunctorFilter<pcl::PointXYZ, decltype(functor)> filter{functor};
filter.setInputCloud(cloud);
auto out_cloud = pcl::make_shared<pcl::PointCloud<pcl::PointXYZ>>();
for (auto _ : st) {
filter.filter(*out_cloud);
}
}
BENCHMARK_DEFINE_F(PointCloudFixture, LambdaTest)(benchmark::State& st)
{
const auto lambda = [](const pcl::PointCloud<pcl::PointXYZ>& cloud,
pcl::index_t idx) {
const auto pt = cloud[idx];
return (pt.getArray3fMap() < 10).all() &&
(pt.getArray3fMap() > -10).all();
};
pcl::FunctorFilter<pcl::PointXYZ, decltype(lambda)> filter{lambda};
filter.setInputCloud(cloud);
auto out_cloud = pcl::make_shared<pcl::PointCloud<pcl::PointXYZ>>();
for (auto _ : st) {
filter.filter(*out_cloud);
}
}
BENCHMARK_DEFINE_F(PointCloudFixture, PassthroughTest)(benchmark::State& st)
{
pcl::PassThrough<pcl::PointXYZ> filter_x, filter_y, filter_z;
auto out_cloud = pcl::make_shared<pcl::PointCloud<pcl::PointXYZ>>();
filter_x.setFilterLimits(-10., 10.);
filter_x.setInputCloud(cloud);
filter_x.setFilterFieldName("x");
filter_y.setFilterLimits(-10., 10.);
filter_y.setInputCloud(out_cloud);
filter_y.setFilterFieldName("y");
filter_z.setFilterLimits(-10., 10.);
filter_z.setInputCloud(out_cloud);
filter_z.setFilterFieldName("z");
for (auto _ : st) {
filter_x.filter(*out_cloud);
filter_y.filter(*out_cloud);
filter_z.filter(*out_cloud);
}
}
BENCHMARK_DEFINE_F(PointCloudFixture, ConditionalTest)(benchmark::State& st)
{
pcl::ConditionalRemoval<pcl::PointXYZ> filter;
auto box_cond = pcl::make_shared<pcl::ConditionAnd<pcl::PointXYZ>>();
box_cond->addComparison(
pcl::make_shared<pcl::FieldComparison<pcl::PointXYZ>>(
"x", pcl::ComparisonOps::GT, -10.0));
box_cond->addComparison(
pcl::make_shared<pcl::FieldComparison<pcl::PointXYZ>>(
"x", pcl::ComparisonOps::LT, 10.0));
box_cond->addComparison(
pcl::make_shared<pcl::FieldComparison<pcl::PointXYZ>>(
"y", pcl::ComparisonOps::GT, -10.0));
box_cond->addComparison(
pcl::make_shared<pcl::FieldComparison<pcl::PointXYZ>>(
"y", pcl::ComparisonOps::LT, 10.0));
box_cond->addComparison(
pcl::make_shared<pcl::FieldComparison<pcl::PointXYZ>>(
"z", pcl::ComparisonOps::GT, -10.0));
box_cond->addComparison(
pcl::make_shared<pcl::FieldComparison<pcl::PointXYZ>>(
"z", pcl::ComparisonOps::LT, 10.0));
filter.setCondition(box_cond);
filter.setInputCloud(cloud);
auto out_cloud = pcl::make_shared<pcl::PointCloud<pcl::PointXYZ>>();
for (auto _ : st) {
filter.filter(*out_cloud);
}
}
BENCHMARK_REGISTER_F(PointCloudFixture, FunctionTest)->Range(8, 8 << 10);
BENCHMARK_REGISTER_F(PointCloudFixture, LambdaTest)->Range(8, 8 << 10);
BENCHMARK_REGISTER_F(PointCloudFixture, PassthroughTest)->Range(8, 8 << 10);
BENCHMARK_REGISTER_F(PointCloudFixture, ConditionalTest)->Range(8, 8 << 10);
BENCHMARK_MAIN();
Run on (8 X 1700 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 2.74, 3.08, 3.17
---------------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------------
PointCloudFixture/FunctionTest/8 131 ns 130 ns 4159808
PointCloudFixture/FunctionTest/64 501 ns 490 ns 1121417
PointCloudFixture/FunctionTest/512 2428 ns 2382 ns 293105
PointCloudFixture/FunctionTest/4096 18383 ns 18337 ns 37591
PointCloudFixture/FunctionTest/8192 36809 ns 36739 ns 19173
PointCloudFixture/LambdaTest/8 110 ns 110 ns 6064439
PointCloudFixture/LambdaTest/64 337 ns 336 ns 2080516
PointCloudFixture/LambdaTest/512 1580 ns 1577 ns 414660
PointCloudFixture/LambdaTest/4096 12228 ns 12196 ns 55128
PointCloudFixture/LambdaTest/8192 25384 ns 25313 ns 27361
PointCloudFixture/PassthroughTest/8 2042 ns 2038 ns 334526
PointCloudFixture/PassthroughTest/64 2631 ns 2623 ns 266375
PointCloudFixture/PassthroughTest/512 8524 ns 8505 ns 81159
PointCloudFixture/PassthroughTest/4096 57221 ns 57036 ns 10940
PointCloudFixture/PassthroughTest/8192 111504 ns 111357 ns 5865
PointCloudFixture/ConditionalTest/8 291 ns 290 ns 2404366
PointCloudFixture/ConditionalTest/64 2150 ns 2145 ns 330248
PointCloudFixture/ConditionalTest/512 17053 ns 17020 ns 41152
PointCloudFixture/ConditionalTest/4096 134140 ns 133865 ns 5070
PointCloudFixture/ConditionalTest/8192 271490 ns 270909 ns 2580
Expected behavior
PCL should not offer duplicate features, specially when the one that is easier to use, search and stumble upon also provides much worse performance. In this case,
- Using lambda is 1.45x faster than
std::function😄 - Using lambda is 4.39x faster than
PassThrough😱 - Using lambda is 10.70x faster than
ConditionalRemoval🙀
Describe the solution you'd like
For simple filters (such as PassThrough and ConditionalRemoval, etc.), we should either
- reimplement them (after benchmarking) OR
- deprecate them
This allows downstream users to get performance improvements (or notifications about a faster way) and also prevents new users from misusing the library. Depending on the decision, we would need to modify the docs and highlight the new feature
Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.
Hi, can I work on this issue? I can see a merged and an open branch referring to this issue right now, so if this performance improvement is still desirable I'd like to help with this.
Hi, can I work on this issue? I can see a merged and an open branch referring to this issue right now, so if this performance improvement is still desirable I'd like to help with this.
Great that you want to work on this, however I am not sure how to proceed here. I think we first have to review the open (draft) pull request above, and see whether we have to make any design decisions. Until then, maybe you find another open issue that interests you?