Alex McKinney

Results 71 comments of Alex McKinney

Hi, I meant is there any advantage to using your pretrained distilled model as an assistant model to the original large model on non-English inputs.

Just tested this and seems no speedup, but that is expected given the difference in training distribution between the base and distilled. Might try my hand at distilling my own...

@sanchit-gandhi That's a good idea (to both points) actually. Thanks for the suggestions.

You will need to train a new distilled model for it to work with v3. The current one won't work out of the box.

Using the squared gradients each step isn't too dissimilar to Adam no? In my experiments I get pretty similar convergence to Adam with the GNB estimator. It's nice to include,...

This same error will occur using `ds = datasets.load_dataset('json', data_files=['test.jsonl'])`

@cccntu I want to make a quick fix for this, but I am struggling to find where the json dataset builder is. Do you know?

> @vvvm23 I think you mean think: You are correct, thanks! > Probably just need to check first if url_or_filename is [PathLike](https://docs.python.org/3/library/os.html#os.PathLike) and return False early. Is PathLike sufficient, or...

Above PR should do your first suggestion. Hope that works for you, as I am going on holiday and won't be able to change much :wink:

@Max-We nice write up! Do you have any plans to integrate the changes into this library? I am wondering whether LoRA finetuning would be sufficient to adapt the model for...