Results 25 issues of neverix

Implementation
Hacktoberfest
lang: kotlin

It seems like the tokenizer ignores the first letter when it's uppercase, chaining `encode_text` + `decode_text` shows that. What could be the source of this bug? Is this the intended...

Currently, the inference code creates the entire attention matrix and then masks it. Sparse attention implementations like Triton are more efficient. Does the pre-training code support sparse attention? Will it...

Fixes #976. Implemented using plain Makefile. It might be a better idea to write a small shell script or go program to avoid using GNU Make-specific hacks.

The model's decoder right now only supports sequential decoding. This is because of the way `attn_state` is implemented. Parallel ~~generation~~ forward pass can be implemented by setting `attn_state` to `None`...

enhancement

First off, great work. Will information about the training be published anywhere? I'm specifically interested in the number of training epochs used and the LR.

Computing Inception takes up to half a minute at the start of the program as of now, so caching it seems obvious. The only thing I'm not sure about is...

Hi, how useful would a script for inpainting and image modification with a pre-trained model be at this stage? Do K-models need some modification to use the basic DDIM inpainting...