candle issues

Add `Device::best_device`, `Device::metal_if_available`

Adds the aforementioned methods to Device. The `Device::best_device` has the same functionality as `candle_examples::device`, and this PR changes `candle_examples::device` to use `best_device`. `metal_if_available` has been added for parity with `cuda_if_available`.

EricLBuehler

Support small-sd and tiny-sd

A while ago there was a [release from segmind](https://blog.segmind.com/introducing-sd-small-and-sd-tiny-stable-diffusion-models/) of two new stable diffusion models which are way smaller and faster to run. I think this would be a great...

RangerMauve

Optimize SDPA implementation

Reasoning: 1) We use lots of elementwise operations: [masked_fill in every layer](https://github.com/huggingface/candle/blob/2be9bd211e34333b605695242896903231ab26da/candle-transformers/src/models/llama.rs#L328-L341), [elementwise addition and division](https://github.com/huggingface/candle/blob/main/candle-transformers/src/models/mistral.rs#L275-L283) in our attention implementations. 2) GEMM APIs like cuBLAS's [gemm](https://docs.nvidia.com/cuda/cublas/#cublas-level-3-function-reference) provide alpha and beta...

EricLBuehler

Add masked_fill under Tensor

2

This unifies the `masked_fill` implementations under Tensor. Addresses #2370 .

fgsch

Commit fea46cb7 breaks Metal Stable Diffusion Generation

4

Commit fea46cb7 breaks the Image generation for the Metal pipeline. still working commit 8696cf64 ``` git checkout 8696cf64947a7f3b712297426078dcf6ab0d199e Previous HEAD position was fea46cb7 Metal bgemm min changes (#2364) HEAD is...

AlpineVibrations

Add Tensor::unfold, `ops::topk_last_dim`

These are a few utility functions which are often useful. Both implementations do not require operations on the CPU. I plan on following up this PR with one for bitwise...

EricLBuehler

When trying to use GPU for SAM inference, I get "matmul is only supported for contiguous tensors lstride"

2

this is my test code: version is 0.6.0 ``` fn sam() { let result: Result = (|| { let directory = "/home/foliage/model/candle-sam".to_string(); let device = Device::new_cuda(0)?; let mode = "ST".to_string();...

deathknight0718

Add dp4a for CC < 610

This PR improves compat for older GPUs where the CC is less than 610. Refs #2348.

EricLBuehler

[Feature] Add masked_fill under Tensor

The equivalent to [torch.Tensor.masked_fill_](https://pytorch.org/docs/stable/generated/torch.Tensor.masked_fill_.html#torch.Tensor.masked_fill_).

fgsch

How to reduce memory usage of backpropagation?

15

I implemented the [tiny NeRF example](https://github.com/bmild/nerf/blob/master/tiny_nerf.ipynb) using `candle` here: https://github.com/laptou/nerfy/blob/fc50dbd61c4012d1f12f556a72474b59a8b3c158/examples/tiny_nerf.rs The example, which is written using TensorFlow, runs fine on my laptop. My `candle` implementation consumes all available memory on...

laptou

candle
candle copied to clipboard

Metadata

Add `Device::best_device`, `Device::metal_if_available`

Support small-sd and tiny-sd

Optimize SDPA implementation

Add masked_fill under Tensor

Commit fea46cb7 breaks Metal Stable Diffusion Generation

Add Tensor::unfold, `ops::topk_last_dim`

When trying to use GPU for SAM inference, I get "matmul is only supported for contiguous tensors lstride"

Add dp4a for CC < 610

[Feature] Add masked_fill under Tensor

How to reduce memory usage of backpropagation?

← Metadata

Owner

Metadata

candle candle copied to clipboard

Metadata

← Metadata

Owner

Metadata

candle
candle copied to clipboard