Jani Monoses comments

Repositories
Issues
Comments

Results 54 comments of


                                            Jani Monoses

[Model] Add OLMoE

@huybery could it be due to possible differences of how silu/swiglu is implemented in Olmoe and the existing FusedMode module?

[Model] Add OLMoE

The RMSNorm outputs differ. Fixing that will correct at least some of the differences between the two model attention outputs. It can be seen by switching forward_native with forward_cuda in...

[Model] Add OLMoE

The weight_loader should be passed name not weight_name, otherwise it silently fails to load the weights in the MoE layer and its output is all zeros. This is the diff...

phi4 example does not work (cli args are swapped, model is not supported)

@mcharytoniuk @LaurentMazare sorry about that, my bad. Looks like I did not test the model I thought I was testing... I hope I will have time to take a look...