Sebastian Raschka comments

Results 821 comments of


                                            Sebastian Raschka

Add OLMo: 1B & 7B

Thanks so much for the feedback @carmocca and @Andrei-Aksionov , this was super helpful! After more tinkering, I went with a custom OLMoMLP (analogous to LLaMALMLP) because I thought this...

Add OLMo: 1B & 7B

> Yes, they use `weight_tying`. It's configurable and they decided to use it. And yes, it won't work during training. Although it's not difficult to add if more models will...

Add OLMo: 1B & 7B

> I would strongly prefer that we don't add this new MLP class. Ok! Maybe let's leave it in there until we got it to work, and then we can...

Add OLMo: 1B & 7B

Just to add a note about pinpointing the difference. With Carlos's help, we found that the difference currently is in how the QKV matrix is split into queries, keys, and...

Warming up the optimizer states with learning rate = 0 for a few steps

This sounds interesting, but I would say let's not do that as a default because then it would become difficult to compare to other LLM frameworks. I do like the...

section 14: AttributeError: module 'tensorflow_estimator.python.estimator.api._v2.estimator' has no attribute 'BoostedTreesRegressor'

I think it might be https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/GradientBoostedTreesModel now

tests are not passing in the branch `main`

Huh that's weird. I haven't touched anything recently. It might be due to some recent package updates. Some packages don't have pinned versions and could be the culprit (https://github.com/rasbt/mlxtend/blob/master/.github/workflows/python-package-conda.yml) have...