Results 99 comments of Stephan Walter

I think you would fix the checks while in draft state, then set it to "ready for review" once they pass. But other people may have different habits. I realize...

I would suggest the following: * disable the `edited, review_requested, ready_for_review` condition for pull requests * enable checks for drafts, so people get a chance for fixing failures before requesting...

Here on AVX2 / 4 cores this is looking good: master 232ms/token, your PR 223ms/token. Prompt eval seems to improve more, as you said, but I haven't looked at that...

Somewhat related to this is the fact that Q8_0 as it is after #1083, #1109 now has two floats that go to waste for Q4_0 and Q4_2, at least for...

The `static_assert(sizeof(`... seems a bit dubious, it should probably be `static_assert(_Alignof(`... (which currently fails). If there was a clean and portable way to specify struct alignment, we could apply that...

With an AVX512 machine, you may want to look into using `_mm256_dpbssd_epi32` in `mul_sum_i8_pairs_float`, that could give another speed boost. (Preprocessor condition: `#if __AVXVNNIINT8__`) Rebasing/merging latest master should fix the...

Maybe you could use `--verbose-prompt` to find out how the prompt is handled exactly? There could be some difference in whitespace or newlines.

Maybe somewhat related to `--author-mode` proposed in #1040, at least in terms of intended user group.

Thanks for putting in the work of writing docs, which many devs don't like to do ;-) I think `--interactive-start` could be removed, as it does the same as `--interactive`...

We might also want to shorten the top-level readme a bit, if there's duplicate information now. We can add a link to examples/main/README.md instead.