Alex McKinney comments

Results 71 comments of


                                            Alex McKinney

english only for large-v2?

Hi, I meant is there any advantage to using your pretrained distilled model as an assistant model to the original large model on non-English inputs.

english only for large-v2?

Just tested this and seems no speedup, but that is expected given the difference in training distribution between the base and distilled. Might try my hand at distilling my own...

english only for large-v2?

@sanchit-gandhi That's a good idea (to both points) actually. Thanks for the suggestions.

Hopefully the large-v3 version will be supported

You will need to train a new distilled model for it to work with v3. The current one won't work out of the box.

Intended usage of the Sophia optimiser

Using the squared gradients each step isn't too dissimilar to Adam no? In my experiments I get pretty similar convergence to Adam with the GNB estimator. It's nice to include,...

Allow pathlib PoxisPath in Dataset.read_json

This same error will occur using `ds = datasets.load_dataset('json', data_files=['test.jsonl'])`

Allow pathlib PoxisPath in Dataset.read_json

@cccntu I want to make a quick fix for this, but I am struggling to find where the json dataset builder is. Do you know?

Allow pathlib PoxisPath in Dataset.read_json

> @vvvm23 I think you mean think: You are correct, thanks! > Probably just need to check first if url_or_filename is [PathLike](https://docs.python.org/3/library/os.html#os.PathLike) and return False early. Is PathLike sufficient, or...

Allow pathlib PoxisPath in Dataset.read_json

Above PR should do your first suggestion. Hope that works for you, as I am going on holiday and won't be able to change much :wink:

"Common Diffusion Noise Schedules and Sample Steps are Flawed" integration

@Max-We nice write up! Do you have any plans to integrate the changes into this library? I am wondering whether LoRA finetuning would be sufficient to adapt the model for...