Szymon Tworkowski

Results 21 comments of Szymon Tworkowski

This is a well known problem occurring on basically all setups (local, colab, gpu cluster) and it is not a big issue for running long experiments, however it does make...

I've also had a similar issue with running the Reformer on a Colab TPU, using this gin config: https://github.com/google/trax/blob/master/trax/supervised/configs/reformer_imagenet64.gin which also seems to use n_hashes > 1. It seems to...

It turns out that function `trax.data.tf_inputs.download_and_prepare` won't download the dataset in case of imagenet64 - it has to be downloaded manually, as per t2t documentation in imagenet.py data generator: ```...

Hi, Thanks for interest in our work! In our paper, the only results we give on arxiv are language modeling perplexity numbers for small models. We do not evaluate LongLLaMA...

Hi, thanks for an excellent question and suggestion of the dataset! We are planning to provide an example of fine-tuning our models using the huggingface API, which would include instruction...

We are planning to release instruction tuning code in pytorch & checkpoints & examples early next week. Stay tuned!

In case you haven't seen, the instruction code is already there! see https://twitter.com/s_tworkowski/status/1687620785379360768 and READMEs in this repo for more details

In terms of instruction finetuning, I personally don't think that it makes much sense to do SFT on models below 3B - I mean 3B models are not very capable...

I see, I see, I think now I understand your question. There is no requirement for a model to be pre-trained with FoT. Both in the paper and LongLLaMA we...

> So in order to get an off the shelf model to work like the LongLlama models y'all released I need to pretrain it for longer using y'all's pre training...