pcl icon indicating copy to clipboard operation
pcl copied to clipboard

Collecting ideas to improve the PCD file format

Open mvieth opened this issue 11 months ago • 3 comments
trafficstars

Currently, the PCD file format is at version 0.7 (and has been for many years. This issue represents a collection of ideas for improving the file format. Everyone is welcome to suggest ideas or comment on ideas. It would also be helpful if people comment which ideas are most important in their opinion. These ideas (or some of them) could some day be implemented in a version 0.8 of the PCD file format.

  • binary compressed data: currently, this uses two 32-bit unsigned binary numbers to specify how large the compressed and uncompressed data is. This should instead be two 64-bit unsigned binary numbers, to allow very large datasets. It might make sense to add a new binary_compressed64 mode for this. Additionally, it could be advantageous if the data can be stored in several sections, where the data in each section is spatially close and can be decompressed independently. Then the reader does not have to decompress all the data at once, which might not fit into memory (in case of very large clouds) (related: https://github.com/PointCloudLibrary/pcl/issues/2152)
  • additional header entries: currently, there is no possibility to store sequence number, timestamp, and frame id (related: https://github.com/PointCloudLibrary/pcl/issues/6152)
  • it could make sense to make some header entries optional, meaning that it is allowed that the entry does not appear, in which case a certain default value is assumed
  • officially declare lines starting with # as comments?
  • maybe make 0.8 a superset of 0.7, so that every valid PCD 0.7 file is also a valid PCD 0.8 file?

It would be good if we can create a list of software that can read or write PCD files, so that we can notify them in case of a PCD version 0.8:

  • https://github.com/PDAL/PDAL
  • https://github.com/CloudCompare/CloudCompare

mvieth avatar Dec 23 '24 18:12 mvieth

Hi,

I found this thread while Googling whether the PCD format supports comments. I haven’t found a definitive answer yet. I assume it does, since comments aren't mentioned in the spec, but there are many comments in the example files provided in the specification. I guess this thread kinda address the question. Anyway, here are my thoughts on the question:

  • The PCD format lacks a bounding box, like almost every other non-LAS format. In my opinion, having access to the file’s extent without reading the entire file is critical.
  • Explicitly mention how coordinates should be named. Specification mention x y z. Is X Y Z valid? Is x y z mandatory?
  • Explicitly mention positions of coordinates. I mean, you can put coordinate after attributes, attributes after coordinates but that changes a lot of thing for a reader. Having x y z always coming first and in this order simplifies a lot the cases a reader must handle.

For the list of software supporting PCD you can add CloudCompare

lidR and lasR will hopefully support PCD soon (80% completed for lasR)

Jean-Romain avatar Jun 01 '25 10:06 Jean-Romain

@Jean-Romain Thanks a lot for your comment! Regarding # comments in PCD files, I can only say that the PCD reader in PCL will treat lines in the header that start with # as a comment and ignore it. I am actually not sure how it would treat lines that have a # not at the beginning (e.g. FIELDS x y z # comment here - is it treated the same as FIELDS x y z)? I am not aware of any real-world PCD files or any PCD writer that uses these kind of partial-line comments. Regarding bounding box: sounds like an interesting idea, at least having this as an optional header entry. I assume you are referring to an axis-aligned bounding box? For the sake of generality, it could make sense to extend this idea to other data fields as well (in addition to xyz), that is, storing the minimum and maximum value of each field. Regarding naming and order of fields: Mandating anything in this regard would violate the idea that every valid PCD 0.7 file should also be a valid PCD 0.8 file, so I am not sure if that is a good idea. Though I can see that this would simplify the code of a PCD reader (however any reader that wants to read 0.7 files would still have to be able to handle potential edge cases). An option might be to make x y z-first a recommendation (for PCD writers), to increase compatibility between different software and converge towards this convention. For comparison, the PLY format does not require any specific order of the fields/properties and does not forbid e.g. X Y Z.

mvieth avatar Jun 03 '25 14:06 mvieth

For the sake of generality, it could make sense to extend this idea to other data fields as well (in addition to xyz), that is, storing the minimum and maximum value of each field.

Yes, this would be more general and therefore simpler.

Regarding the other comments, I’d say that overall, the PCD specifications are not specific enough in my opinion. They allow too many variations. For example, take the x y z issue: how am I supposed to know which numbers represent the point coordinates if neither the position nor the field names are enforced by the specifications? I mean, I can assume x y z, and I guess that’s what most writers will use. I actually used X Y Z, which is not forbidden by the spec, but then CloudCompare was no longer able to read my files because it assumes x y z is the standard.

On my side, my reader checks for both x and X, but the spec does not disallow names like Xcoord... My reader also assumes the coordinates are the first three numbers. When parsing binary PCD, that makes things much simpler since we can just memcpy the underlying dat and fits with my memory layout. But CloudCompare, for example, writes PCD files with coordinates at the end of the list.

Regarding comments: as you mentioned, PCL skips lines starting with #, but this is not part of the official specification. CloudCompare seems to do the same (I still need to verify), but I don’t, because it wasn’t mentioned in the spec.

So in my opinion, PCD 0.8 would benefit from having a stricter specification that clearly defines what is allowed and what is not. This, plus an optional value range for each attribute. But if you ask me, I think the range for x y z should be mandatory.

Jean-Romain avatar Jun 03 '25 15:06 Jean-Romain