Thorin Farnsworth comments

Results 30 comments of


                                            Thorin Farnsworth

Inconsistent loss term with paper

I've not checked, but is the MLP following a concatenation of sin-cos values? I think `c_in` scaling does make sense to use. `x_t` is going to be very large towards...

The difference between Loss( f(x0+t_{n+1}z) , f(x0+t_{n}z) ) and Loss( f(x0+t_{n+1}*z), x0 )?

I think this might just make it a score-based model. Like Yang Songs earlier work, or instead very like [EDM](https://arxiv.org/pdf/2206.00364.pdf). The `+1` wouldn't really matter I feel.

The difference between Loss( f(x0+t_{n+1}z) , f(x0+t_{n}z) ) and Loss( f(x0+t_{n+1}*z), x0 )?

> loss(x0+t_{n+1}*z), x0) can only enforce the model to learn the expectation of x0, not the target point of the current trajectory. It is a wrong loss term for one...

Distributed broadcast fails with simple GPU tensor on Windows + GLOO

```python def sync_params(params) -> None: with torch.no_grad(): for i, p in enumerate(params): p_copy = p.detach() dist.broadcast(p_copy, 0) p.copy_(p_copy) ```

How to load huge file of data?

> You can also collect byte offsets for each line in a large file and store it in a dictionary. > > ```python > offset_dict = {} > with open(large_file_path,...

Training on a large dataset is not working.

Move the tokenisation to `__getitem__`

Training on a large dataset is not working.

this runs the tokenizer in getitem, saving a bunch of memory. then the length cropping should also work such that can assign to `tokens[i, :length]` correctly if you have long...

Any plans to support tree attention mask?

I'm keen to try supporting a generic mask case, like [B, Q, K] bool, and doing conditional execution. Ideally this covers quite a lot of masking cases, but I guess...

Any plans to support tree attention mask?

What I mean is that for a structured mask you don't necessarily have to create a bool tensor. In the casual case it can be hardcoded in the kernel to...

Is there some easy to get started guide?

https://github.com/ARM-software/CMSIS-NN/issues/145 I've made a similar request here. Would be great to see the most barebones example possible within the example folder of this repo.