jan
jan copied to clipboard

Published 20 hours ago •

Reame
Issues

feat: Queue System for Inference?

Open dan-jan opened this issue 7 months ago • 2 comments

Objective

Do we need a simple queue system?

Motivation

Nullpointer Errors?

Currently, inference requests are handled FIFO
We are adopting an OpenAI API, which means that we will receive requests across Chat, Audio, Vision etc
Given that users are on laptops with limited RAM and VRAM, we are likely to have to switch models

Preparing for Cloud Native

Our long-term future is likely as an enterprise OpenAI-alternative, which will be multi-user and have a queue system
Should we bake in this abstraction, and use a local file-based queue (which is later swapped out for a more sophisticated queue?)

Nov 28 '23 14:11 dan-jan