Thomas Fitzsimmons

Results 49 comments of Thomas Fitzsimmons

Hi, I was able to bootstrap CCL on a Raptor Talos II machine, albeit with a few cheats. I wanted to mention the process here in case it provided motivation...

ChatGPT is out-of-date regarding the Power ISA being proprietary. It is open source now, just like RISC-V. See [https://openpowerfoundation.org/](https://openpowerfoundation.org/).

@ggerganov, sure, I'll try to fit the POWER9 optimizations into the main SIMD structure, some time after #324 lands in the master branch. Agreed regarding 5s likely not being optimal....

The remaining slowness seems to be in the short-to-fp32 conversion. Would it make sense to try a GGML_TYPE_F32 version of ggml-base.en.bin, to eliminate the conversion steps? Can someone outline steps...

Thanks for the model instructions @ggerganov. With the FP32 model and #366 I get: ``` $ time ./main -t 32 -m ../fp32-model/ggml-model-f32.bin samples/jfk.wav whisper_model_load: loading model from '../fp32-model/ggml-model-f32.bin' whisper_model_load: n_vocab...

@luke-jr Now that #369 is merged can you try `bench` with various arguments, and post an updated table of results to #89? Then #300 can probably be closed.

@ggerganov Yes, 87dd4a30811ee07700ee6fee267508e8935b32fc is about half-a-second faster on the jfk example, I guess due to the FP16 lookup table.

When I build natively on POWER9, I don't use CMake. I just use the checked-in Makefile. The output is attached. You'll have to get CMake to set the same options....

To get the model loading I modified read_safe with: ``` if constexpr (std::endian::native == std::endian::big) { dest = std::byteswap(dest); } ``` Then, running with `-t 1` for consistency, in the...

#398 is where I'm at. Starting with the 32592001th call to `ggml_vec_dot_f16`, src1->data is pointing at different data on the big and little endian targets. I'm not sure what next...