AlpinDale
AlpinDale
This PR adds an example to calculate model perplexity, this approach performs it by using `prompt_logprobs`: - Generate text using a dataset and extract 1 prompt logprob per token. -...
Not all warnings can be avoided, so we can suppress them. It looks a bit ugly when compiling verbosely.
This PR adds function calls, similar to OpenAI's original implementation. Launch the API server with `--enable-api-tools` argument to use.
Currently, we auto-scale using the `--max-model-len` argument. It may be more appropriate to have specific options for the scaling factor, etc.
This PR removes Ray as a requirement for multi-gpu, allowing us to use standard python multiprocessing instead. Ray can be optionally installed for multi-node setup. To install Aphrodite with Ray,...
This PR is an attempt at replacing the regular softmax for MoE gating with a [Gumbel-Softmax](https://arxiv.org/abs/1611.01144) function instead. The paper [Approximating Two-Layer Feedforward Networks for Efficient Transformers](https://arxiv.org/abs/2310.10837) suggests that softmax...
Building the project fails due to the missing `` header. This should allow building the project. Closes #1
Hi, I've been trying to compile the RapidLLaMA, but it seems to have issues. Is the repo still incomplete? I also had to manually build `sleef` with the test units...
Hi! I've been trying to use mikupad with my custom OpenAI API server, but I don't have a `/v1/token/encode` endpoint. I was adding it, but unsure what the request body...
This PR adds [Aphrodite Engine](https://github.com/PygmalionAI/aphrodite-engine) to the list of local apps. Aphrodite is a tensor-parallel LLM inference engine based on [vLLM](https://github.com/vllm-project/vllm), with support for almost all transformers models and quantization...