Szymon Tworkowski comments

Results 21 comments of


                                            Szymon Tworkowski

import trax takes 17 seconds

This is a well known problem occurring on basically all setups (local, colab, gpu cluster) and it is not a big issue for running long experiments, however it does make...

LSH Reformer with multiple hashes not possible on TPU

I've also had a similar issue with running the Reformer on a Colab TPU, using this gin config: https://github.com/google/trax/blob/master/trax/supervised/configs/reformer_imagenet64.gin which also seems to use n_hashes > 1. It seems to...

Reformer imagenet64 gin config - dataset loading failure

It turns out that function `trax.data.tf_inputs.download_and_prepare` won't download the dataset in case of imagenet64 - it has to be downloaded manually, as per t2t documentation in imagenet.py data generator: ```...

Code for zero-shot arxiv evaluation

Hi, Thanks for interest in our work! In our paper, the only results we give on arxiv are language modeling perplexity numbers for small models. We do not evaluate LongLLaMA...

How would you go about instruction finetuning?

Hi, thanks for an excellent question and suggestion of the dataset! We are planning to provide an example of fine-tuning our models using the huggingface API, which would include instruction...

How would you go about instruction finetuning?

We are planning to release instruction tuning code in pytorch & checkpoints & examples early next week. Stay tuned!

How would you go about instruction finetuning?

In case you haven't seen, the instruction code is already there! see https://twitter.com/s_tworkowski/status/1687620785379360768 and READMEs in this repo for more details

How would you go about instruction finetuning?

In terms of instruction finetuning, I personally don't think that it makes much sense to do SFT on models below 3B - I mean 3B models are not very capable...

How would you go about instruction finetuning?

I see, I see, I think now I understand your question. There is no requirement for a model to be pre-trained with FoT. Both in the paper and LongLLaMA we...

How would you go about instruction finetuning?

> So in order to get an off the shelf model to work like the LongLlama models y'all released I need to pretrain it for longer using y'all's pre training...