Michael Wilson

Results 9 comments of Michael Wilson

Hi Henrik, I had tried using `rmvn`/`dmvn` from `mvnfast` as well, but it didn't seem to be helping (at least, with my setup). Trying to run it serially with the...

Some further testing on a smaller model showed that (for 4 repetitions on 4 cores) the current implementation took 1.15 minutes, replacing the calls to `mvtnorm` with the corresponding calls...

I'm getting stuck in a different way after following @b0kch01's changes. I'm not running into the MP=0 issue, but I get some warnings about being unable to initialize the client...

Unfortunately, I'm still unable to run anything using @b0kch01's `llama-cpu` repo on Linux. ``` [W socket.cpp:426] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family...

Maybe the `load` function [here](https://github.com/tloen/llama-int8/blob/main/example.py) will be useful?

It looks like 1217 and 3582 are sub-word tokens: ```python >>> tokenizer.encode('no', bos = False, eos = False) [694] >>> tokenizer.encode('thno', bos = False, eos = False) [266, 1217] >>>...

I ran some tests on a smaller model and it does appear that running more repetitions when logml cannot be determined within maxiter is not a workaround; the distributions do...

I've gotten both 8B and 70B (non-chat) running on a CPU. This will _probably_ work for the chat models, but I haven't checked those. You will need at least ~64GB...

After some testing, it appears that the tokenizers on HF are probably the same as the one for OPT-175B (at the very least, my output for a short test made...