infinity
infinity copied to clipboard
Abstraction for `resolve_torch_dtype_device(dtype: Dtype, device: Device) -> tuple[quantization_type, torch.device, torch.dtype]`
Feature request
Too much boilerplate template:
Resolves loading, quantization, and device
Eg. if device: auto -> torch.cuda.is_available() -> cuda or mps. dtype: float32 -> float32, no quantization dtype: float16 -> float16, no quantization dtype: bfloat16 -> float16, no quantization dtype: auto -> (bfloat16 if possible else float16) if device is cuda else float32, no quantization dtype: int8 -> float32, int8 quantization dtype: fp8 -> float32, fp8 quantization
Motivation
Your contribution
@michaelfeil I believe this method should exist as a method of https://github.com/michaelfeil/infinity/blob/62a07c9d91b8bddb999001277563dbbde24844d4/libs/infinity_emb/infinity_emb/env.py#L24 or a method in the same file?