jan
jan copied to clipboard
feat: Queue System for Inference?
Objective
- Do we need a simple queue system?
Motivation
Nullpointer Errors?
- Currently, inference requests are handled FIFO
- We are adopting an OpenAI API, which means that we will receive requests across Chat, Audio, Vision etc
- Given that users are on laptops with limited RAM and VRAM, we are likely to have to switch models
Preparing for Cloud Native
- Our long-term future is likely as an enterprise OpenAI-alternative, which will be multi-user and have a queue system
- Should we bake in this abstraction, and use a local file-based queue (which is later swapped out for a more sophisticated queue?)