pyntcloud
pyntcloud copied to clipboard
Pyntcloud.from_file for multiple files?
Perhaps this is a rare use-case, but I often have folders full of .ply files to load into one cloud. I am currently manually reading them in (just points, probably adding mesh and other structures is much harder), but it could be helpful to have a feature for this built-in (or an easy way to concatenate two Pyntcloud objects?)
from pyntcloud.io import read_ply
last_df = None
for c_file in all_files:
cur_df = read_ply(c_file)['points']
if last_df is None:
last_df = cur_df
else:
last_df = pd.concat([last_df, cur_df])
Hola @kmader ! Thanks for the comment :)
I don't think that it is a rare use case at all, however, it looks like it should be carefully implemented and I'mt not sure about the best way to do that right now, opinions are wellcome.
Concatenating two PyntCloud objects might be non trivial because, as you mentioned, there are some structures wich could complicate the operation. Maybe a simple points-points concatenation could be just enough for most use cases?
In addition to this, some questions arise for me:
Should redundant points (i.e with the exact same coordinates) be removed as part of this concatenation?? Should information about scale / coordinate system / sensor position (if applies) be added to the PyntCloud class in order to facilitate this operation and maybe others?
I would add the concatenation operation to the never-ending TODO list, but without the highest priority because, as you show, it is actually possible to hack a solution for some use cases with some pandas tricks.
Pd.
I'm not sure if this will work but maybe you could compact the code as follows:
merged_points = pd.concat([read_ply(x)["points"] for x in all_files])
@daavoo I had actually not considered the more general case requiring transforming coordinates and correcting for sensor information. That makes the problem considerably less tractable. The pandas 'hacks' are fine for now and makes it easy to use tools like dask.dataframe's to make loading quicker / parallel.
that snippet is much nicer (but uses more memory since you have all the pieces in memory and then concatenate them). python, unfortunately, has a horrible trade-off between elegance and efficiency.
The dask.dataframe looks very interesting as an optinal (activate when user wants) drop-in replacement for the pandas dataframe, definetly will take a look to that.
Regarding de memmory usage, how about replacing the list comprehension by a generator??
merged_points = pd.concat((read_ply(x)["points"] for x in all_files))