Open3D
Open3D copied to clipboard
open3d.utility.Vector3dVector() sometimes slow
Checklist
- [X] I have searched for similar issues.
- [X] For Python issues, I have tested with the latest development wheel.
- [X] I have checked the release documentation and the latest documentation (for
masterbranch).
Describe the issue
I have a strange performance issue with open3d.utility.Vector3dVector().
I use PCD files to visualize 3D point cloud data with open3d and I read these files with pandas because I need additional attributes like intensity from the PCD file.
open3d.utility.Vector3dVector() is much slower when I use the numpy array coming from a file than when I use use some random numpy array.
Vector3dVector data_pd: 86.57ms
Vector3dVector data_np: 0.51ms
Steps to reproduce the bug
import open3d as o3d
import numpy as np
import pandas as pd
from timeit import default_timer as timer
file = 'frame.csv'
data_np = 400 * np.random.rand(131072, 3) - 200
df_1 = pd.DataFrame(data_np, columns=list('xyz'))
df_1.to_csv(file)
df_2 = pd.read_csv(file, names=list('xyz'), skiprows=1)
data_pd = df_2.loc[:, ['x', 'y', 'z']].to_numpy()
assert data_np.dtype == data_pd.dtype
assert np.allclose(data_np, data_pd)
t2 = timer()
pc_points1 = o3d.utility.Vector3dVector(data_pd)
t3 = timer()
pc_points2 = o3d.utility.Vector3dVector(data_np)
t4 = timer()
print(f'Vector3dVector data_pd: \t\t {((t3 - t2) * 1e3):.2f}ms')
print(f'Vector3dVector data_np: \t\t {((t4 - t3) * 1e3):.2f}ms')
Error message
No response
Expected behavior
I expect comparable speed.
Open3D, Python and System information
- Operating system: Ubuntu 20.04
- Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) \n[GCC 9.4.0]
- Open3D version: output from python: 0.15.2+55ded67
- System architecture: x86
- Is this a remote workstation?: no
- How did you install Open3D?: pip
Additional information
No response
Maybe related to #4116
We will look into the issue with Vector3dVector. Meanwhile, you might be interested to know, that open3d supports custom attributes such as intensity in the new tensor-based module i.e. open3d.t.io.read_point_cloud, will support the custom attributes, so you may not require to use pandas.
Thank you!
I tried out open3d.t.geometry.PointCloud a couple of weeks ago but it could not replace open3d.geometry.PointCloud entirely as plotting in the standard visualizer, plane fitting, and some other features were not yet available in the release 0.15.2.
The issue
The main reason is that data_pd is not C-contiguous (The default for NumPy array, but if you transpose an array, it becomes F-contiguous). Open3D uses C-contiguous arrays internally for Eigen. When an array is not C-contiguous, Vector3dVector does the conversion row-by-row, leading to slow performance.
ipdb> data_pd.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
ipdb> data_np.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
Quick fix
The quick fix is to convert the array to a contiguous array before calling Vector3dVector. The conversion still has overhead, but it is already much faster than before.
t2 = timer()
data_pd = np.ascontiguousarray(data_pd)
pc_points1 = o3d.utility.Vector3dVector(data_pd)
t3 = timer()
pc_points2 = o3d.utility.Vector3dVector(data_np)
t4 = timer()
On my machine
# Before
Vector3dVector data_pd: 35.64ms
Vector3dVector data_np: 0.20ms
# After quick fix
Vector3dVector data_pd: 0.47ms # time for both ascontiguousarray and Vector3dVector
Vector3dVector data_np: 0.15ms
Future fix
We'll be looking into Vector3dVector and see if the c-contiguous conversion can be handled internally in a more efficient way.
Thank you @yxlao!
We will look into the issue with
Vector3dVector. Meanwhile, you might be interested to know, that open3d supports custom attributes such as intensity in the new tensor-based module i.e.open3d.t.io.read_point_cloud, will support the custom attributes, so you may not require to use pandas.
Maybe interesting in this context: I also kept using pandas when I tested Open3D Tensors because it was way faster to read my PCD file.
open3d.t.io.read_point_cloud: 250ms
pd.read_csv: 70ms
I use PCD files with 131072 points, which have nine attributes with different data types.
@marcelbrucker is it possible to share the PCD file (or a truncated version of the PCD file), and the code you use? We can do some debugging to improve the t.io.read_point_cloud speed.
import open3d as o3d
import numpy as np
import pandas as pd
from timeit import default_timer as timer
device = o3d.core.Device("CPU:0")
input_file = "Example_PCD.pcd"
# Read PCD file with various attributes
t1 = timer()
pcd = o3d.t.io.read_point_cloud(input_file)
xyz = pcd.point["positions"].numpy()
intensity = pcd.point["intensity"].flatten().numpy()
ring = pcd.point["ring"].flatten().numpy()
ambient = pcd.point["ambient"].flatten().numpy()
additional_attributes = np.hstack((intensity[:, None], ring[:, None], ambient[:, None]))
pc = o3d.geometry.PointCloud()
pc.points = o3d.utility.Vector3dVector(np.ascontiguousarray(xyz))
pc.normals = o3d.utility.Vector3dVector(np.ascontiguousarray(additional_attributes))
t2 = timer()
# Read PCD file with pandas
# pcd_pd = pd.read_csv(input_file, sep=" ", header=0, names=["x", "y", "z", "intensity", "t", "reflectivity", "ring", "ambient", "range"], skiprows=10, dtype={"x": np.float32, "y": np.float32, "z": np.float32, "intensity": np.float32, "t": np.uint32, "reflectivity": np.uint16, "ring": np.uint8, "ambient": np.uint16, "range": np.uint32})
pcd_pd = pd.read_csv(input_file, sep=" ", header=0, names=["x", "y", "z", "intensity", "t", "reflectivity", "ring", "ambient", "range"], skiprows=10, dtype=float)
xyz_pd = pcd_pd.loc[:, ["x", "y", "z"]].to_numpy()
additional_attributes_pd = pcd_pd.loc[:, ["intensity", "ring", "ambient"]].to_numpy()
pc_pd = o3d.geometry.PointCloud()
pc_pd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(xyz_pd))
pc_pd.normals = o3d.utility.Vector3dVector(np.ascontiguousarray(additional_attributes_pd))
t3 = timer()
print(f"PCD with open3d.t.io: \t\t {((t2 - t1) * 1e3):.2f}ms")
print(f"PCD with pandas: \t\t {((t3 - t2) * 1e3):.2f}ms")
PCD with open3d.t.io: 465.82ms
PCD with pandas: 81.87ms
Considering the datatypes triples the time needed by pandas but it's still faster.
@marcelbrucker thank you, that is helpful. We'll look into that.
I have simular problems, but i'm not sure if the reason is the same. The code below takes about 1 second, the function 'remove_zeros' is to delete the [0,0,0] in the points, or it will be even slower.
The pcdmap_im numpy array is from my pybind11 function, i convert the cv::Mat to py::array_t
open3d 0.15.2
points = remove_zeros(pcdmap_im)
points = np.ascontiguousarray(points)
start = time.time()
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
print("convert pcd to o3d takes: ", time.time()-start)
cv::Mat image = self.GetPcdMap(); // here return a cv::Mat with 3 channels
py::array_t<float> result = py::array_t<float>(image.rows * image.cols * image.channels());
auto buf = result.request();
float *ptr = (float *) buf.ptr;
for (int i = 0; i < image.rows; i++) {
for (int j = 0; j < image.cols; j++) {
for (int k = 0; k < image.channels(); k++) {
*ptr++ = image.at<cv::Vec3f>(i, j)[k] * 0.001;
}
}
}
return result;
0 frames were cleared from the cyclic buffer
Frame was triggered, Frame Id: 270
convert pcd to o3d takes: 0.8412952423095703
Frame was triggered, Frame Id: 271
convert pcd to o3d takes: 0.9536771774291992
Frame was triggered, Frame Id: 272
convert pcd to o3d takes: 0.8492419719696045
Frame was triggered, Frame Id: 273
convert pcd to o3d takes: 0.9604189395904541
Frame was triggered, Frame Id: 274
convert pcd to o3d takes: 0.8200759887695312
disconnect camera
Could you please tell me why the conversion code from numpy to o3d.points can be such slow? Thanks a lot!