mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Sampling on the GPU for as long as possible

Open EricLBuehler opened this issue 1 year ago • 2 comments

Currently, we apply all sampling:

  • Sequentially
  • On the CPU

This is super slow. This PR is going to refactor the sampling system to do as much sampling work on the GPU, in parallel, as much as possible until we need to copy the final token & logprobs to the CPU. Only then is the final GPU <> CPU sync done.

EricLBuehler avatar Jul 25 '24 01:07 EricLBuehler

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   11          102          101            0            1
 Python                 41         1586         1368           46          172
 TOML                   19          564          498           11           55
-------------------------------------------------------------------------------
 Jupyter Notebooks       2            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               24         1832            0         1382          450
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 6          407          364           19           24
 |- TOML                 2           75           63            0           12
 (Total)                           2519          619         1401          499
-------------------------------------------------------------------------------
 Rust                  168        54909        49845          983         4081
 |- Markdown            90          850           13          787           50
 (Total)                          55759        49858         1770         4131
===============================================================================
 Total                 270        59504        52234         2422         4848
===============================================================================
  

github-actions[bot] avatar Jul 25 '24 01:07 github-actions[bot]

Pending some resolution of huggingface/candle#2361, otherwise we still have to do a huge GPU <> CPU sync early.

EricLBuehler avatar Jul 27 '24 16:07 EricLBuehler