text-generator.io
text-generator.io copied to clipboard
Switch to vLLM with custom stopping criteria
Summary
- add vLLM based inference helper with min-probability and sentence stopping
- integrate vLLM into the FastAPI server when available
- make
tests/conftestresilient ifhttpxis missing - add basic unit test for vLLM inference
Testing
pytest -q(fails: ModuleNotFoundError: No module named 'cachetools')
https://chatgpt.com/codex/tasks/task_e_683fe5b06bf083338fb1ba3540b415dc