K024 issues

Results 4 issues of


K024

CPU memory leak

A CPU memory leak is observed when inferencing using GPU, even when `NativeOps` is not used (by removing libllm_sharp_ops.so). CPU memory continues to grow while inferencing. Diagnostics from `torch.Tensor.TotalCount` and...

help wanted

GPTQ model weights conversion/interop

It is possible to convert GPTQ models without act_order (when g_idx is not used) to AWQ gemv compatible format since AWQ gemv changed the pack order to a natural order....

enhancement

Unable to use torch.Generator on CUDA

Minimal reproduction: ```c# var generator = new torch.Generator(42, torch.device("cuda")); Console.WriteLine(generator.device); Console.WriteLine(generator.get_state()); var distribution = torch.tensor(new float[] {0.1f, 0.2f, 0.3f, 0.4f}, device: torch.device("cuda")); var output = torch.multinomial(distribution, num_samples: 1, generator: generator);...

Custom cpu and cuda operators support

Hi. Currently I'm trying to implement some large language models (LLM) with TorchSharp and got a nice demo ([here](https://github.com/K024/llm-sharp)). But when moving forward to more features I found some lacking...