mistral.rs Fix clippy shadowing

Should reduce memory usage and hopefully increase speed...

Jun 18 '24 19:06 EricLBuehler

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 31         1217         1038           37          142
 TOML                   16          440          400            1           39
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               16         1149            0          846          303
 |- BASH                 5          100           97            0            3
 |- Python               6          122          110            0           12
 |- Rust                 2           80           72            3            5
 (Total)                           1451          279          849          323
-------------------------------------------------------------------------------
 Rust                  115        34412        31161          585         2666
 |- Markdown            57          641           13          594           34
 (Total)                          35053        31174         1179         2700
===============================================================================
 Total                 191        37715        33014         1469         3232
===============================================================================

Jun 18 '24 19:06 github-actions[bot]

@chenwanqq I cannot measure any T/s speedup. I think that because we are using a GPU and the commands are async, the Rust drop code runs quickly enough that there is no difference.

I implemented this change only for the models/quantized_llama.rs code.

Jun 18 '24 20:06 EricLBuehler

@chenwanqq I cannot measure any T/s speedup. I think that because we are using a GPU and the commands are async, the Rust drop code runs quickly enough that there is no difference.

I implemented this change only for the models/quantized_llama.rs code.

I think the problem might not be about speed, but about peak memory usage.🧐 For instance, whether it can run a model within limited memory space or how many tokens it can process for a given model.

Jun 19 '24 02:06 chenwanqq