manifest issues

How to load model with half-precision, such as float16 since only have limited gpu memory

## Description of the bug can not load model with half precision. And haven't figured out how to transfer model to CPU or GPU? ## To Reproduce run model gpt-j-6B...

HiddenAlaska

Please do not hard code the api-version in the AzureChatClient

```python def get_generation_url(self) -> str: """Get generation URL.""" engine = getattr(self, "engine") deployment_name = AZURE_DEPLOYMENT_NAME_MAPPING.get(engine, engine) return ( self.host + "/openai/deployments/" + deployment_name + "/chat/completions?api-version=2023-05-15" ) ``` The api-version is...

whitewum

Specify multiple gpus for deepspeed, accelerate, etc.

I was wondering if there was a good way to specify which GPUs deepspeed, accelerate, etc. should utilize. Right now I've been doing something like: ``` CUDA_VISIBLE_DEVICES=3,4,5 python -m manifest.api.app...

kaifronsdal

Laurel/bump version

lorr1

Unable to load local model : id must be in the form 'repo_name' or 'namespace/repo_name'

4

## Description of the bug When loading a local model using this command : ``` python3 -m manifest.api.app \ --model_type huggingface \ --model_name_or_path /workspace/models/minotaur-15b \ --device 0 \ --model_generation_type text-generation...

flotos

HuggingFace Hub

3

It seems like HuggingFace is only supported for local models. It would be nice to support calling model on the Hub with the InferenceAPI. I think this is relatively straightforward...

srush

Add support for OpenRouter and update OpenAIChat

- add support for OpenRouter ([https://openrouter.ai/](https://openrouter.ai/)) - add `gpt-4o` and `gpt-4o-mini` to `OpenAIChatClient`

tdoehmen

manifest
manifest copied to clipboard

Metadata

How to load model with half-precision, such as float16 since only have limited gpu memory

Please do not hard code the api-version in the AzureChatClient