Awni Hannun
Awni Hannun
@madroidmaq sorry for the delay! I think we can merge this. But could you first rebase to resolve conflicts?
There still isn't much standardization around thinking / chains-of-thought around tokenizers and how to respond with them. But I do think it's worth making it easier for downstream applications if...
It seems like there are some links or code missing in order to reproduce the issue. You posted a bunch of steps and the one script `compare_dac_outputs.py` but a lot...
Closing as inactive. Feel free to comment with steps to repro and we can reopen if needed.
Emulating FP64 on the GPU is going to be quite slow and there's a good chance it will wipe out any speed improvements you might expect from running on the...
What I meant by that is any framework that uses an Apple gpu wiil have the same problem including PyTorch's MPS back-end (which does not support double for the same...
Is the reference code you posted working as expected? If not, what's the issue with it?
What you have looks right to me.. though for fast inference it probably makes sense to precompute the normalized weight since it's not changing.
That is also fine.. the hook will get called every time. If you want to precompute the weight for inference you would want to set the weight normalized weights just...
This is a tricky one to fix, but it would be good to work towards being able to catch errors from another thread. I think one thing we could think...