MEGABYTE-pytorch icon indicating copy to clipboard operation
MEGABYTE-pytorch copied to clipboard

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Results 5 MEGABYTE-pytorch issues
Sort by recently updated
recently updated
newest added

Hi there, Megabyte paper uses bits-per-byte in Table 2 as their evaluation metric. It seems it has difference compared with byte level perplexity, since their number in arXiv and Code...

```python self.to_kv = nn.Linear(dim, dim_head * 2, bias = False) # expected self.to_kv = nn.Linear(dim, inner_dim * 2, bias = False) ``` Is this a trick? or bug?

Hi there. I’ve run the training code in this repository for 25k out of the 100k batches and achieved a validation loss of around 1.28, or perplexity of 3.59. After...

Thank you so much for taking the time to share your code with me! I appreciate your generosity in helping me better understand the paper. I noticed that your code...

how do we translate the various model size parameters provided in the paper to the max_seq_len and depth tuple arguments when constructing the model?