llama.cpp
llama.cpp copied to clipboard
Parallel Quantize.sh, add &
@prusnak
./quantize "$i" "${i/f16/q4_0}" 2 &
The fix need to be more elaborate, because if you pass --remove-f16
then the rm
command is called before ./quantize
has finished.
Can you come up with a solution that does not have this issue?
This should work:
Yes, this works. But now I realised this completely defeats the purpose of the remove flag. The remove flag is there to save disk space after each conversion has been done. So this means the remove flag only makes sense when processing the files one after each other.
@ggerganov Do you think it makes sense to run the script in parallel by default and switch to serial processing when --remove-f16
is provided or do we want to have a separate orthogonal flag for parallel/serial processing?
ah I see what you mean, swapping disk resources
I think it is better to multi-thread the quantize.cpp
program.
Each tensor is divided in n
parts and each of the n
threads quantizes the corresponding part.
This way, even when quantizing the 7B model which has only 1 part, we will utilize all available CPU resources and still gain performance.
If you agree, either reformulate this issue and add "good first issue" tag or create a new one and close this.
I think it is better to multi-thread the
quantize.cpp
program.
I agree. This makes sense especially for this reason:
This way, even when quantizing the 7B model which has only 1 part, we will utilize all available CPU resources
If you agree, ...
ACK
FWIW, I really respect your shell skills @tljstewart 👍
Done another way (rewrite to python) in https://github.com/ggerganov/llama.cpp/pull/222