Piotr Wilkin (ilintar) comments

Results 77 comments of


                                            Piotr Wilkin (ilintar)

Model: Qwen3 Next

Update: we have output! My 500M version is producing very nice outputs already: ```console user Let's go! assistant Javier斫 fond𬸚עמק(cursorStick面對 Cunningham.semgetNumjest茶叶ador Ce serão_BG Delete Regular.LoadScene anchppelin.win้ม indexing een닙)object עצמו markedbaby干部继承所能...

Model: Qwen3 Next

@theo77186 Nah, I wouldn't expect the first version that actually produces output to produce correct output, that would be a miracle :) Now comes the part of comparing intermediate results...

Model: Qwen3 Next

@theo77186 added the exclusion of MTP layers from conversion

Model: Qwen3 Next

Argh, it doesn't use the standard RMS norm either: ```python class Qwen3NextRMSNormGated(nn.Module): def __init__(self, hidden_size, eps=1e-6, **kwargs): super().__init__() self.weight = nn.Parameter(torch.ones(hidden_size)) self.variance_epsilon = eps def forward(self, hidden_states, gate=None): input_dtype =...

Model: Qwen3 Next

> glad im not an ai engineer Neither am I :laughing:

Model: Qwen3 Next

Now that's a new one I haven't seen before :) I'll probably resume tomorrow, my brain is a bit fried.

Model: Qwen3 Next

> For some reason, for the 70M model, `conv_states` is 50% larger than expected, will try to see what's going on. Just for reference, I can't make your 70M model...

[Improvement] TTS and STT endpoints

TTS looks reasonable, since whisper.cpp has whisper-server, so we can run the Whisper model from there. Also, llama.cpp has support for some TTS models, though not through the server endpoint....

[Improvement] TTS and STT endpoints

Any reason why you want to immediately stop the server? I'd see it more like another instance - start a whisper server on demand if needed, stop it if any...

[Improvement] TTS and STT endpoints

Cool! Doing a refactoring now to fix the thread launching logic, I'll try to merge when I'm done with that.