Dmitry Nikitko

Results 4 comments of Dmitry Nikitko

Did you use the default learning rate (0.01) ? If you use only one GPU for trining try to set lr = 0.00125

I'm using this approach You can also calculate mean of the last hidden state, but don't forget to apply L2 norm after that. It might work better than EOS embedding...

> @Puzer Thanks, What's your take on the quality of sentence representations using this method? i'm not sure the model manages to do that very good I tried to embed...

> The effective thing to do now is > > with dspy.settings.context(lm=…): …. Inside this block, the lm is different (you can also nest this pattern) Yep, it works. I...