Tobrun
Tobrun
A user reached out about `queryRenderedFeatures` returning features when clicking on a hole of a FillLayer. I was able to reproduce this issue only on lower zoom levels. In gif...
## 🐛 Bug :wave: been diving more into the freezing issue noted in https://github.com/mlc-ai/mlc-llm/issues/1379, creating a new issue with better details and ways to reproduce. When I load a llama-2-7b...
## Overview TBD if this is in-scope for this project. This issue tracks the possibility of distributing ready-made libraries for developers looking to integrate a specific model. Atm, the will...
### Description ### Screenshots or Gifs ### Checklist - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my...
The current inference system launches 8 separate VLLM instances (one per GPU) but underutilizes VLLM's native batching capabilities. Each query is assigned to a single VLLM instance in a round-robin...
This block of code forces having a setup with 8 GPUs and that GPU needs enough VRAM to host a single instance of the model: ``` CUDA_VISIBLE_DEVICES=0 vllm serve $MODEL_PATH...
This is a first step towards improving the usability and robustness of this repository. This change ensures we clean up all the vLLM instances when we exit: - finished executing...
Currently NCCL_SOCKET_IFNAME and GLOO_SOCKET_IFNAME are hardcoded while it could be resolved with a simple lookup. This results in errors: - no socket interface found - Unable to find address for:...
Management around vLLM should be improved. When something goes wrong, like `Input file not found at..` the system doesn't cleanly exit and kill the vLLM instances which results in zombie...