Simon Jégou

Results 34 comments of Simon Jégou

Here it is ! It's quite quick and dirty as I reload Gs every time I generate a new batch. But time does not really matters here as it converge...

Don't have much time to work on this project but it's great tok ow you had some progress ! To answer a previous I noticed that face recovered using gradient...

@Quasimondo spotted a mistake in my code : in the `finetune_18` function, the `w_mix` argument is missing in the `get_batch` call of the training phase. So the function does nothing...

Great job @rolux !! @pbaylies, as you studied this encoder question in depth. What are your main feedbacks ? Does EfficientNet bring additional precision for initialization ? What is the...

@tridao for more context, I recently published a post on the current Kaggle LLM science exam competition ([here](https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/440620)) showing that it's possible to run a 70B model on a single...

Hello @BenjaminBossan , Thanks for your quick answer. About DoRA, $W_{ft} = m_{dora} * (W_{base} + W_{delta})$ and $W_{wise} = (1-\alpha) * W_{base} + \alpha * W_{ft} $ so $W_{wise}...

I'm confident it will work reasonably well with DoRA too as for $\alpha=0$ and $\alpha=1$ it returns the right results. However I do not have any experimental data to prove...

I might work on it but I don't have immediate bandwidth Le mar. 23 juil. 2024, 11:42, Benjamin Bossan ***@***.***> a écrit : > Okay, too bad :-] Still I...

Hello, Many thanks to @ariG23498 for working on this feature and @BenjaminBossan for reviewing it. I used the code at the end of this message for some sanity checks: -...

@BenjaminBossan I'm running it on a macbook with `device="mps"`. The issue I get is simply: ```bash assert torch.allclose(outputs["merged"], outputs["scale 1.0"]) AssertionError ``` When I get this error, the plot shows...