AlpinDale

Results 75 issues of AlpinDale

This PR adds an example to calculate model perplexity, this approach performs it by using `prompt_logprobs`: - Generate text using a dataset and extract 1 prompt logprob per token. -...

Not all warnings can be avoided, so we can suppress them. It looks a bit ugly when compiling verbosely.

This PR adds function calls, similar to OpenAI's original implementation. Launch the API server with `--enable-api-tools` argument to use.

Currently, we auto-scale using the `--max-model-len` argument. It may be more appropriate to have specific options for the scaling factor, etc.

enhancement

This PR removes Ray as a requirement for multi-gpu, allowing us to use standard python multiprocessing instead. Ray can be optionally installed for multi-node setup. To install Aphrodite with Ray,...

This PR is an attempt at replacing the regular softmax for MoE gating with a [Gumbel-Softmax](https://arxiv.org/abs/1611.01144) function instead. The paper [Approximating Two-Layer Feedforward Networks for Efficient Transformers](https://arxiv.org/abs/2310.10837) suggests that softmax...

Building the project fails due to the missing `` header. This should allow building the project. Closes #1

Hi, I've been trying to compile the RapidLLaMA, but it seems to have issues. Is the repo still incomplete? I also had to manually build `sleef` with the test units...

Hi! I've been trying to use mikupad with my custom OpenAI API server, but I don't have a `/v1/token/encode` endpoint. I was adding it, but unsure what the request body...

This PR adds [Aphrodite Engine](https://github.com/PygmalionAI/aphrodite-engine) to the list of local apps. Aphrodite is a tensor-parallel LLM inference engine based on [vLLM](https://github.com/vllm-project/vllm), with support for almost all transformers models and quantization...