pyntcloud
pyntcloud copied to clipboard
perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness
UPDATE: I have rebased this PR on top of the latest commit. The revised changes are:
- perf: Speed up reading of ASCII PLY files.
- feat: improve robustness for OFF headers on e.g. ModelNet40
- perf: reuse already open file for reading instead of opening it twice
- style: renamed variables for clarity (e.g.
color
->has_color
; andcount
->n_header
)
In particular, ModelNet40 has faulty headers:
$ head -n 1 ModelNet40/chair/train/chair_0856.off
OFF6586 5534 0
For reference, the correct format is:
OFF
6586 5534 0
Nonetheless, it is still valuable to parse the faulty header.
(Original text before #353 was merged)
Big performance improvement by removing the need to use the slow engine="python"
by reading the sliced file from an in-memory StringIO buffer.
Also fixes bug where OFF files containing more lines than num_points + num_faces
tries to read potential edges as faces!
As Wikipedia says, the OFF file may contain:
- points
- faces (optional)
- edges (optional)
Of course, this still does not encompass all possible OFF file variants described by Wikipedia, but it's an improvement.
Both this PR and #353 improved pandas performance for *.OFF files with engine=c
. Therefore, I rebased this PR on top of #353. This PR still contains some other useful changes, listed above.
Future work:
Once this is reviewed/accepted, I can look into improving compatibility with Wikipedia's description of the *.OFF file format. Of course, perfect compatibility is too slow, but there's still some missing features:
- "C" in the header should not be needed to detect the presence of color (see Wikipedia's example).
- Edges, and edge colors.