llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Feature Request: Use direct_io for model load and inference

Open jagusztinl opened this issue 6 days ago • 0 comments

Prerequisites

  • [x] I am running the latest code. Mention the version if possible as well.
  • [x] I carefully followed the README.md.
  • [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [x] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

By this article using unbuffered reads could speed up modell load and big modell inference on memory constrained servers: https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826

Motivation

Modell load times could be significact for big modells like R1

Possible Implementation

Use O_DIRECT flag for modell load

jagusztinl avatar Feb 16 '25 17:02 jagusztinl