pcl icon indicating copy to clipboard operation
pcl copied to clipboard

PCDWriter::writeBinaryCompressed cannot write large pcd file due to 32-bit limitation

Open weimzh opened this issue 7 years ago • 14 comments

PCDWriter::writeBinaryCompressed cannot write large pcd file due to 32-bit size limitation:

(gdb) l 380
375	  }
376	
377	  char* temp_buf = static_cast<char*> (malloc (static_cast<size_t> (static_cast<float> (data_size) * 1.5f + 8.0f)));
378	  // Compress the valid data
379	  unsigned int compressed_size = pcl::lzfCompress (only_valid_data, 
380	                                                   static_cast<uint32_t> (data_size), 
381	                                                   &temp_buf[8], 
382	                                                   static_cast<uint32_t> (static_cast<float>(data_size) * 1.5f));
383	  unsigned int compressed_final_size = 0;
384	  // Was the compression successful?
(gdb) p data_size
$13 = 2935264192
(gdb) p static_cast<uint32_t> (static_cast<float>(data_size) * 1.5f)
$14 = 107928576

(note that 2935264192*1.5=4402896288, which overflowed on an uint32_t value)

... and this will generate an error message like: [pcl::lzf_compress] Attempting to write data outside the output buffer! terminate called after throwing an instance of 'pcl::IOException' what(): : [pcl::PCDWriter::writeBinaryCompressed] Error during compression!

Your Environment

  • Operating System and version: Ubuntu 14.04
  • Compiler: gcc 4.8.4
  • PCL Version: 1.8.1

Expected Behavior

Write the large .pcd file correctly.

Current Behavior

Generate the following exception and crash: [pcl::lzf_compress] Attempting to write data outside the output buffer! terminate called after throwing an instance of 'pcl::IOException' what(): : [pcl::PCDWriter::writeBinaryCompressed] Error during compression!

Possible Solution

Replace the 32-bit size value used by lzfCompress() to 64-bit (size_t or uint64_t).

weimzh avatar Dec 14 '17 15:12 weimzh

the data size in file header is also limited, so maybe there is no way to fix this without changing the file format :(

weimzh avatar Dec 14 '17 17:12 weimzh

This is a valid feature/request/need so we'll definitely have a look on how to properly support it.

SergioRAgostinho avatar Dec 15 '17 10:12 SergioRAgostinho

I don't see a safe way to fix it. The PCD file format has a concept of version, so in theory we can introduce a new revision .8 where data size in binary compressed format is stored in 64 bits instead of 32. In practice, however, loaders don't seem to care about the version number when reading PCD files (PCL itself does not care, and this Python loader https://github.com/dimatura/pypcd also). I would speculate that that are a few more loaders for other languages and they also just ignore the version number. All of them will be silently broken.

taketwo avatar Dec 15 '17 10:12 taketwo

I would publicly announce on the mailing lists the format change and just let the downstream projects deal with it.

I might be worth to keep track of a wiki article logging the changes between different PCD versions.

SergioRAgostinho avatar Dec 15 '17 11:12 SergioRAgostinho

Our "official" "specification" does not even talk about binary_compressed storage format :disappointed:

taketwo avatar Dec 15 '17 11:12 taketwo

a unintrusive yet hackish way to fix this might be:

  1. if the data size is no more than 4GB, write the file the same way as before
  2. otherwise, write an "invalid" (however identifiable) value to the header (such as zero and/or 0xFFFFFFFF), which is followed by the 'real' 64-bit size.

This way, everything which was working would not be broken.

weimzh avatar Dec 17 '17 05:12 weimzh

Indeed too hackish for my taste :sweat_smile:.

The version numbers exist exactly for these situations. They're just not being handled because so far, there was no need for it. IMO you brought in a case which justifies a revision.

Edit: [...] might justify a revision [...]. I'll still have to check if there's no easy way out of it.

SergioRAgostinho avatar Dec 19 '17 01:12 SergioRAgostinho

Has there been any progress on this withing the last months? We are using the compressed pcd file format a lot and deal with large pcd files, so any solution to this would be very useful to us.

If a change of the file format does not seem feasible in the near future, would an alternative solution be to check the size of the input data when saving a binary compressed pcd, and to print out a warning and save an uncompressed pcd if the the input data is to large for the binary compressed format? While not fixing the problem of large compressed pcds, it would allow for saving larger pcds without the software using the pcl having to try and guess why the compressed pcd didn't save.

floriankramer avatar May 18 '18 12:05 floriankramer

I agree that PCL should fail gracefully when the file is too large to be saved in binary compressed format. However, I don't think that automatically saving in uncompressed format instead is a good behavior. The software using PCL can check the return value of the writing function and act accordingly.

So what we can do is add a size check here, print an error, and return error code (e.g. -2).

taketwo avatar May 20 '18 19:05 taketwo

Thanks to #2323 we now have a warning and a dedicated error code. However, the issue remains, so keeping this open.

taketwo avatar May 30 '18 13:05 taketwo

I am getting the same error, is there any way to fix it . I am using the pcl recorder in carla-ros-bridge

kanakab avatar Apr 16 '21 22:04 kanakab

@kanakab What is the error message you receive exactly? I had a quick look at the pcl recorder but only saw uncompressed writes (e.g. here), this issue is about compressed writes. Also: what kind of points are used in your case (with colour, with intensity, ...), and how many points are in the cloud?

mvieth avatar Apr 17 '21 14:04 mvieth

@mvieth I was able to solve this by creating two different pointclouds and concatenating them. The error is resolved. Thanks for your help.

kanakab avatar Apr 20 '21 08:04 kanakab

The binary data should be compressed in chunks not in one big block. Every chunk can have its own compressed and uncompressed size prefixed in form of two 32 bit uints. An implementation can be found here: http://dist.schmorp.de/liblzf/

#define BLOCKSIZE (1024 * 64 - 1)
//...
static int
compress_fd (int from, int to)
{
  ssize_t us, cs, len;
  u8 buf1[MAX_BLOCKSIZE + MAX_HDR_SIZE + 16];
  u8 buf2[MAX_BLOCKSIZE + MAX_HDR_SIZE + 16];
  u8 *header;

  nr_read = nr_written = 0;
  while ((us = rread (from, &buf1[MAX_HDR_SIZE], blocksize)) > 0)
    {
      cs = lzf_compress (&buf1[MAX_HDR_SIZE], us, &buf2[MAX_HDR_SIZE], us > 4 ? us - 4 : us);
      if (cs)
        {
          header = &buf2[MAX_HDR_SIZE - TYPE1_HDR_SIZE];
          header[0] = 'Z';
          header[1] = 'V';
          header[2] = 1;
          header[3] = cs >> 8;
          header[4] = cs & 0xff;
          header[5] = us >> 8;
          header[6] = us & 0xff;
          len = cs + TYPE1_HDR_SIZE;
        }
      else
        {                       // write uncompressed
          header = &buf1[MAX_HDR_SIZE - TYPE0_HDR_SIZE];
          header[0] = 'Z';
          header[1] = 'V';
          header[2] = 0;
          header[3] = us >> 8;
          header[4] = us & 0xff;
          len = us + TYPE0_HDR_SIZE;
        }

      if (wwrite (to, header, len) == -1)
        return -1;
    }

  return 0;
}

One chunk should contain a set of points so one can read big PCD files "chunkwise".

thohoj avatar May 13 '22 13:05 thohoj