Michael Wilson
Michael Wilson
Hi Henrik, I had tried using `rmvn`/`dmvn` from `mvnfast` as well, but it didn't seem to be helping (at least, with my setup). Trying to run it serially with the...
Some further testing on a smaller model showed that (for 4 repetitions on 4 cores) the current implementation took 1.15 minutes, replacing the calls to `mvtnorm` with the corresponding calls...
I'm getting stuck in a different way after following @b0kch01's changes. I'm not running into the MP=0 issue, but I get some warnings about being unable to initialize the client...
Unfortunately, I'm still unable to run anything using @b0kch01's `llama-cpu` repo on Linux. ``` [W socket.cpp:426] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family...
Maybe the `load` function [here](https://github.com/tloen/llama-int8/blob/main/example.py) will be useful?
It looks like 1217 and 3582 are sub-word tokens: ```python >>> tokenizer.encode('no', bos = False, eos = False) [694] >>> tokenizer.encode('thno', bos = False, eos = False) [266, 1217] >>>...
I ran some tests on a smaller model and it does appear that running more repetitions when logml cannot be determined within maxiter is not a workaround; the distributions do...
I've gotten both 8B and 70B (non-chat) running on a CPU. This will _probably_ work for the chat models, but I haven't checked those. You will need at least ~64GB...
After some testing, it appears that the tokenizers on HF are probably the same as the one for OPT-175B (at the very least, my output for a short test made...