drhead

Results 73 comments of drhead

Cosine similarities of the intermediate hidden states on a test image (pre-resized to 384x384, also made sure both are running on 576 seqlen) are roughly 0.985-0.99, and the pooled outputs...

[WslLogs-2025-03-29_17-13-26.zip](https://github.com/user-attachments/files/19523013/WslLogs-2025-03-29_17-13-26.zip)

Another possible instance of this bug is happening whenever I initialize a CUDA context using PyTorch. Simply loading an interactive Python shell and using: ```py >>> import torch >>> x...