Sebastian Raschka

Results 821 comments of Sebastian Raschka

Thanks so much for the feedback @carmocca and @Andrei-Aksionov , this was super helpful! After more tinkering, I went with a custom OLMoMLP (analogous to LLaMALMLP) because I thought this...

> Yes, they use `weight_tying`. It's configurable and they decided to use it. And yes, it won't work during training. Although it's not difficult to add if more models will...

> I would strongly prefer that we don't add this new MLP class. Ok! Maybe let's leave it in there until we got it to work, and then we can...

Just to add a note about pinpointing the difference. With Carlos's help, we found that the difference currently is in how the QKV matrix is split into queries, keys, and...

This sounds interesting, but I would say let's not do that as a default because then it would become difficult to compare to other LLM frameworks. I do like the...

I think it might be https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/GradientBoostedTreesModel now

Huh that's weird. I haven't touched anything recently. It might be due to some recent package updates. Some packages don't have pinned versions and could be the culprit (https://github.com/rasbt/mlxtend/blob/master/.github/workflows/python-package-conda.yml) have...

or just to clarify, was it the CI or local testing?

@NimaSarajpoor With "local" testing I meant locally on your computer 😅. It's weird that it worked on the other machine, but I am also glad to hear that there are...

Thanks for the note! I can confirm, having this issue in sklearn 1.3.0 as well (but not in 1.2.2). I just submitted a PR via #1060 to fix that