pcl icon indicating copy to clipboard operation
pcl copied to clipboard

[pcl::io::loadPCDFile] The issue of thread stopping

Open sharknow opened this issue 10 months ago • 8 comments
trafficstars

Occasionally, there may be thread jamming issues when reading a PCD file.

The following is the stack information when the thread stops: #0 0x0000ffff98218b44 in std::istream::sentry::sentry(std::istream&, bool) () from /lib/aarch64-linux-gnu/libstdc++.so.6 #1 0x0000ffff981c4420 in std::basic_istream<char, std::char_traits >& std::getline<char, std::char_traits, std::allocator >(std::basic_istream<char, std::char_traits >&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, char) () from /lib/aarch64-linux-gnu/libstdc++.so.6 #2 0x0000ffff742e1798 in pcl::PCDReader::readHeader(std::istream&, pcl::PCLPointCloud2&, Eigen::Matrix<float, 4, 1, 0, 4, 1>&, Eigen::Quaternion<float, 0>&, int&, int&, unsigned int&) () from /lib/aarch64-linux-gnu/libpcl_io.so.1.10 #3 0x0000ffff742e31a4 in pcl::PCDReader::readHeader(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, pcl::PCLPointCloud2&, Eigen::Matrix<float, 4, 1, 0, 4, 1>&, Eigen::Quaternion<float, 0>&, int&, int&, unsigned int&, int) () from /lib/aarch64-linux-gnu/libpcl_io.so.1.10 #4 0x0000ffff742e0ea8 in pcl::PCDReader::read(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, pcl::PCLPointCloud2&, Eigen::Matrix<float, 4, 1, 0, 4, 1>&, Eigen::Quaternion<float, 0>&, int&, int) () from /lib/aarch64-linux-gnu/libpcl_io.so.1.10 #7 0x0000ffff981d7f9c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #8 0x0000ffff98c51624 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0 #9 0x0000ffff9804562c in ?? () from /lib/aarch64-linux-gnu/libc.so.6

sharknow avatar Jan 02 '25 07:01 sharknow

@sharknow Does this always happen when you try to load a specific PCD file? If yes, please upload that file (zipped). In addition to the stack information you posted, is there anything else printed? An error message, or an exception? Do you try to load PCD files in parallel? Please show the relevant part of your code (where the error happens).

mvieth avatar Jan 02 '25 09:01 mvieth

@sharknow Does this always happen when you try to load a specific PCD file? If yes, please upload that file (zipped). In addition to the stack information you posted, is there anything else printed? An error message, or an exception? Do you try to load PCD files in parallel? Please show the relevant part of your code (where the error happens).

It doesn't always happen, using the same PCD file, the next load will be normal.

There is no other stack information, the phenomenon is that this thread is stuck, and its CPU usage is 99%, which has been like this all along.

I did not attempt to load the PCD file in parallel, but I can give it a try next.

The relevant code is shown below:

pcl::PointCloud<pcl::PointXYZ>::Ptr load_points(new pcl::PointCloud<pcl::PointXYZ>());

for (const auto& pcd_path : pcd_file_list) {
  pcl::PointCloud<pcl::PointXYZ>::Ptr points_in(new pcl::PointCloud<pcl::PointXYZ>());

  if (pcl::io::loadPCDFile<pcl::PointXYZ>(pcd_path, *points_in) != 0) {
    break;
  }
  *load_points = (*load_points) + (*points_in);
}

sharknow avatar Jan 02 '25 11:01 sharknow

@sharknow Does this always happen when you try to load a specific PCD file? If yes, please upload that file (zipped). In addition to the stack information you posted, is there anything else printed? An error message, or an exception? Do you try to load PCD files in parallel? Please show the relevant part of your code (where the error happens).

It doesn't always happen, using the same PCD file, the next load will be normal.

There is no other stack information, the phenomenon is that this thread is stuck, and its CPU usage is 99%, which has been like this all along.

I did not attempt to load the PCD file in parallel, but I can give it a try next.

The relevant code is shown below:

pcl::PointCloud<pcl::PointXYZ>::Ptr load_points(new pcl::PointCloud<pcl::PointXYZ>());

for (const auto& pcd_path : pcd_file_list) {
  pcl::PointCloud<pcl::PointXYZ>::Ptr points_in(new pcl::PointCloud<pcl::PointXYZ>());

  if (pcl::io::loadPCDFile<pcl::PointXYZ>(pcd_path, *points_in) != 0) {
    break;
  }
  *load_points = (*load_points) + (*points_in);
}

At that time, the computer's memory usage was also at a normal level

sharknow avatar Jan 02 '25 11:01 sharknow

It doesn't always happen, using the same PCD file, the next load will be normal.

Still, please upload a PCD file (zipped) where the error has happened so that I can inspect it and try to reproduce the problem on my computer. Which OS do you use, and which compiler (version)? Which PCL version exactly? What I did not understand yet: does the program end/crash on its own, or do you terminate it (e.g. with Ctrl+C) because it is stuck in an endless loop?

mvieth avatar Jan 02 '25 13:01 mvieth

pcd_file.zip

I am using a Linux system.

pcl: 1.10.

compiler (version): gcc 9.4.0

The program was terminated by me, and if I don't terminate it, it will keep getting stuck. The CPU usage rate has always been around 98%.

sharknow avatar Jan 03 '25 02:01 sharknow

I am using a Linux system.

Which kind? Ubuntu, or Debian, or something else? Which version?

pcl: 1.10.

PCL 1.10.0 or PCL 1.10.1? I assume you installed PCL via a package manager? Can you instead build the same version from source, in debug mode (CMAKE_BUILD_TYPE=Debug), and post another stack print? I am hoping to see in which lines in sentry the program is stuck.

Can you say how many PCD files are successfully loaded, before the program gets stuck?

I did not attempt to load the PCD file in parallel, but I can give it a try next.

No need to try that if the program did not do that before. But are you using multiple threads in your program? I am asking because I saw start_thread in the stack information, and I am wondering if that could be related to the problem.

Also, what kind of computer do you use? Is it a Raspberry Pi by chance? If yes, which model? I found this which might be related: https://forums.raspberrypi.com/viewtopic.php?t=281333

mvieth avatar Jan 03 '25 10:01 mvieth

I am using a Linux system.

Which kind? Ubuntu, or Debian, or something else? Which version?

pcl: 1.10.

PCL 1.10.0 or PCL 1.10.1? I assume you installed PCL via a package manager? Can you instead build the same version from source, in debug mode (CMAKE_BUILD_TYPE=Debug), and post another stack print? I am hoping to see in which lines in sentry the program is stuck.

Can you say how many PCD files are successfully loaded, before the program gets stuck?

I did not attempt to load the PCD file in parallel, but I can give it a try next.

No need to try that if the program did not do that before. But are you using multiple threads in your program? I am asking because I saw start_thread in the stack information, and I am wondering if that could be related to the problem.

Also, what kind of computer do you use? Is it a Raspberry Pi by chance? If yes, which model? I found this which might be related: https://forums.raspberrypi.com/viewtopic.php?t=281333

Thank you very much for your reply!

Linux Ubuntu system.

pcl: 1.10.0

I have created a separate thread to load the map, which is executed approximately every minute. Approximately 40 pcd files are loaded each time. Sometimes it gets stuck when loading to the 10th file, and sometimes it gets stuck when loading to the 7th file. But what I can confirm is that these PCD files are normal, and when I restart the program, I can successfully load these map files. The frequency of these issues is not high, and the program runs every day, with one or two blocking problems occurring every day. When the program starts running in the morning, it usually doesn't have any problems. After running for about 3 or 5 hours, it will have problems, which makes me feel very strange.

Computer: NVIDIA Orin NX Developer Kit

Operating System: Ubuntu 20.04.5 LTS Kernel: Linux 5.10.104-tegra Architecture: arm64

sharknow avatar Jan 06 '25 06:01 sharknow

Unfortunately, I do not have a promising idea what could be causing this problem.

Inside PCL, we do not do anything extraordinary: First the file stream is opened: https://github.com/PointCloudLibrary/pcl/blob/af3ce2530b7ae8ed083a3515168626c587a5bbcd/io/src/pcd_io.cpp#L391-L392 then it reads the PCD file line-by-line: https://github.com/PointCloudLibrary/pcl/blob/af3ce2530b7ae8ed083a3515168626c587a5bbcd/io/src/pcd_io.cpp#L141-L143 I see nothing that could explain the problem.

The stack information you posted at the top indicates that the thread gets stuck in the sentry constructor. The source code of the constructor is here: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/istream.tcc#L52 Again, I do not see how it could get stuck there. Can you try to get a stack trace with exact line numbers? For example gdb prints line numbers in the stack trace. It would be good to know in which line exactly the thread gets stuck in the sentry constructor.

Your description makes it sound like a race condition (loads files successfully many times, then fails seemingly randomly). The fact that the files are loaded in a separate thread could also hint in that direction. The solution to a race condition is a mutex lock, but since I don't know the overall structure of your program, you will have to check for yourself whether a mutex lock could make any sense there. And I am not even sure which resource would have to be protected by the mutex lock.

I assume that these 40 PCD files you load do not change while the program is running?

mvieth avatar Jan 18 '25 15:01 mvieth