robot-grasp
robot-grasp copied to clipboard
try different model architectures
RGB and D due to their acquisition are not naturally aligned
TODO
- [x] concat RGB+D
- [x] process RGB and D independently and merge near the last layer
- [x] attach a depth prediction head at an intermediate layer: RGB -> h [-> D; -> grasp valuation]
having depth regression as an extra supervision during training improves the accuracy
concat RGB+D -> 92% acc late concat -> 89% RGB only -> 90% depth only -> 73% RGB only with extra depth supervision -> 91%