Abstraction for `resolve_torch_dtype_device(dtype: Dtype, device: Device) -> tuple[quantization_type, torch.device, torch.dtype]`

Open michaelfeil opened this issue 1 year ago • 1 comments

Feature request

Too much boilerplate template:

Resolves loading, quantization, and device

Eg. if device: auto -> torch.cuda.is_available() -> cuda or mps. dtype: float32 -> float32, no quantization dtype: float16 -> float16, no quantization dtype: bfloat16 -> float16, no quantization dtype: auto -> (bfloat16 if possible else float16) if device is cuda else float32, no quantization dtype: int8 -> float32, int8 quantization dtype: fp8 -> float32, fp8 quantization

Motivation

Your contribution

Oct 14 '24 15:10 michaelfeil

@michaelfeil I believe this method should exist as a method of https://github.com/michaelfeil/infinity/blob/62a07c9d91b8bddb999001277563dbbde24844d4/libs/infinity_emb/infinity_emb/env.py#L24 or a method in the same file?

Oct 22 '24 14:10 EricLiclair