candle
candle copied to clipboard
Minimalist ML framework for Rust
i try to use 2080ti to inference, but it raise error. Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading is_u32_f32
Hi, I was running the BERT example code and noticed that some of the variables weren't correctly aligning with the current Safetensors obtained via: ``` let repo: ApiRepo = api.model("bert-base-uncased".to_string());...
I'm running the dinov2 example on CPU on a Cortex-A76 computer, except I've quantised it to fp16. Looking at its perf profile, a large subset is due to running scalar...
The following code for the contiguous check in **shape.rs** will trigger problems for the squeezed tensor (n-dim to 1-dim) because of the " if dim > 1" condition (recently added...
This feature is support load vision dataset from image-foler, like torchvision.datasets.ImageFolder. In my projects, I need to load dataset from image folder for train my model, and I found candle...
All, I saw this morning that Tim Dettmers bitsandbytes python lib uses Nvidia's [Unified Memory](https://developer.nvidia.com/blog/unified-memory-cuda-beginners/) by [default](https://x.com/stasbekman/status/1749968490155696612), see (`csrc/pythonInterface.c:377`). It doesn't look like candle, via cudarc supports this. I'm interested...
Would it possible to implement 1.58 bit quantization on candle ? It was proposed in the following paper, https://arxiv.org/pdf/2402.17764.pdf The main inspiration behind using 1.58 bit implementation is that you...
I'm running the Llama example on a machine with an Nvidia T4 16GB to compare the performance with HF Transformers + PyTorch. Here's the Python example I'm running: ```python import...
Hi, I'd like to rig one of the examples into a service, where the service (http) gets a prompt and runs `TextGeneration`. As it stands, `TextGeneration` wants to _own_ model...
For example, to get a particular embedding model's dimensions, *without* doing a test embedding first. At the moment I'm running just a dummy embedding to get to an output and...