Daniel
Daniel
https://www.digitaltrends.com/computing/microsoft-nvidia-tensorrt-llm-update-ignite-2023/ Tasks - [ ] Step by step docs for Jan Windows TensorRT-LLM - 1 day - [ ] Updated code in `triton-tensorrt-llm` extension - 1 day Reference https://github.com/NVIDIA/trt-llm-rag-windows/blob/release/1.0/app.py#L43
WIP Spec - Need to figure out if BigDL or Intel Extensions are separate - Have Extensions for each inference engine - `model.json` should have an `engine: intel-bigdl` or `engine:...
**Motivation:** **WIP Spec:** - Assistants are a way of packaging things, a framework - Depends on the Hub, and other prereqs - We need a separate epic to track RAG
## Objective - Do we need a simple queue system? ### Motivation _Nullpointer Errors?_ - Currently, inference requests are handled FIFO - We are adopting an OpenAI API, which means...
## Objective - [ ] Description should be updated, including Project Website - [ ] Code signing for Ubuntu? 
## Todos - [x] Rename "Plugins" to Extensions or Modules - [ ] Document Extensions and Modules Architecture - [ ] Document key Extensions (e.g. Models, Inference, Threads etc) -...
- Allow multi-modal input to Jan - Requires UI support - Requires API support
## Objective - As part of larger epic, we need to autodetect the users' hardware and show recommending models - Our long-term goal is to help the user "run best...
**Problem** Feature requested by Sabin_Stargem from r/localllama - I actually think this is a great idea, especially for multi-modal AI - Niche feature for power users with multi-GPU setups -...
## Objective - WIP spec ## Resources - Apple Ferret - Need to support MLX - Support Quantization - https://www.reddit.com/r/LocalLLaMA/comments/18oke4y/apples_mlx_framework_adds_quantization_support/