vit-pytorch
vit-pytorch copied to clipboard
Using SimpleVit to estimate odometry
I'm investigating vision transformer models' performance on visual odometry. As a start, I am using your implementation of the SimpleViT. As I am quite new to the field I don't really understand everything yet and I am getting some weird results.

I only get lines as my output and I can't figure out why... Is it because the last layer of the transformer is a linear layer? I have 6 outputs (x, y, z, yaw, roll, pitch) and I am using a custom loss function similar to that of DeepVO.
As I said this is all very experimental so I don't mind having bad results, I just want to understand why I can't even get results.
Thanks in advance!