Thorin Farnsworth
Thorin Farnsworth
I've not checked, but is the MLP following a concatenation of sin-cos values? I think `c_in` scaling does make sense to use. `x_t` is going to be very large towards...
I think this might just make it a score-based model. Like Yang Songs earlier work, or instead very like [EDM](https://arxiv.org/pdf/2206.00364.pdf). The `+1` wouldn't really matter I feel.
> loss(x0+t_{n+1}*z), x0) can only enforce the model to learn the expectation of x0, not the target point of the current trajectory. It is a wrong loss term for one...
```python def sync_params(params) -> None: with torch.no_grad(): for i, p in enumerate(params): p_copy = p.detach() dist.broadcast(p_copy, 0) p.copy_(p_copy) ```
> You can also collect byte offsets for each line in a large file and store it in a dictionary. > > ```python > offset_dict = {} > with open(large_file_path,...
Move the tokenisation to `__getitem__`
this runs the tokenizer in getitem, saving a bunch of memory. then the length cropping should also work such that can assign to `tokens[i, :length]` correctly if you have long...
I'm keen to try supporting a generic mask case, like [B, Q, K] bool, and doing conditional execution. Ideally this covers quite a lot of masking cases, but I guess...
What I mean is that for a structured mask you don't necessarily have to create a bool tensor. In the casual case it can be hardcoded in the kernel to...
https://github.com/ARM-software/CMSIS-NN/issues/145 I've made a similar request here. Would be great to see the most barebones example possible within the example folder of this repo.