lorax Server error: This model was initialized with the adapter xxx and therefore does not support dynamic adapter loading. Please initialize a new model instance from the base model in order to use the dynamic adapter loading feature.

Server error: This model was initialized with the adapter xxx and therefore does not support dynamic adapter loading. Please initialize a new model instance from the base model in order to use the dynamic adapter loading feature.

Open avoskresensky opened this issue 1 year ago • 4 comments

System Info

Latest official docker image (ghcr.io/predibase/lorax:87412e1)
www.runpod.io platform (GPU containers)
- GPU pod with 2xA100 80G.

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Create runpod.io container with the following parameters:

GPU pod with 2xA100 80G.

image: ghcr.io/predibase/lorax:87412e1
container disk: 10G
volume disk: 200G
volume mount path: /data
expose HTTP ports: 8080
environment variables:
- MODEL_ID: mistralai/Mixtral-8x7B-Instruct-v0.1
- ADAPTER_ID: antonvo/mixtral-select-v1
- DTYPE: float16
- PORT: 8080
- MAX_INPUT_LENGTH: 32767
- MAX_TOTAL_TOKENS: 32768
- MAX_BATCH_PREFILL_TOKENS: 32768
- MAX_BATCH_TOTAL_TOKENS: 32768
- NCCL_P2P_DISABLE: 1

I'm following the documentation and trying to make the requests. The request to the main model works fine:

curl https://isjbblljgjb8v5-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{
        "inputs": "[INST] Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]",
        "parameters": {
            "max_new_tokens": 64
        }
    }' \
    -H 'Content-Type: application/json'

{"generated_text":"Natalia sold a total of 72 clips.\n\nHere’s the reasoning:\n\n1. In April, Natalia sold 48 clips to 48 of her friends.\n2. In May, she sold 24 clips (half as many) to her friends"}

The request to the fine-tuned model, whoever, fails like so:

curl https://isjbblljgjb8v5-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{
        "inputs": "[INST] Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]",
        "parameters": {
            "max_new_tokens": 64,
            "adapter_id": "antonvo/mixtral-select-v1"
        }
    }' \
    -H 'Content-Type: application/json'

{"error":"Request failed during generation: Server error: This model was initialized with the adapter antonvo/mixtral-select-v1 and therefore does not support dynamic adapter loading. Please initialize a new model instance from the base model in order to use the dynamic adapter loading feature.","error_type":"generation"}

Expected behavior

Being able to generate using the adapter.

Feb 14 '24 03:02 avoskresensky

Hey @avoskresensky, it looks like this is happening because you set ADAPTER_ID: antonvo/mixtral-select-v1 during LoRAX initialization, resulting in LoRAX preloading that adapter and merging the weights back into the base model. This was an experimental feature we are considering removing, as it can result in weird situations like this.

My recommendation would be to remove ADAPTER_ID from the environment (only use it as a request parameter) and make sure LoRAX isn't being initialized with --adapter-id.

Let me know how that goes!

Feb 14 '24 06:02 tgaddair

Hi @tgaddair,

Thanks for quick followup.

I removed ADAPTER_ID and NCCL_P2P_DISABLE env variables (the latter was a workaround for an NCCL timeout during the merge).

I'm getting this error now:

{"error":"Request failed during generation: Server error: CHECK_EQ(tmp.size(0), static_cast<int64_t>(sgmv_tmp_size(num_problems))) failed. 8388608 vs 60","error_type":"generation"}

the request is

curl https://tfd39fnvks3f2v-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{
        "inputs": "[INST] Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]",
        "parameters": {
            "max_new_tokens": 64,
            "adapter_id": "antonvo/mixtral-select-v1"
        }
    }' \
    -H 'Content-Type: application/json'

The adapter files in /data/models--antonvo--mixtral-select-v1 seems to be in a good shape.

Feb 15 '24 00:02 avoskresensky

Just to confirm is this adapter from huggingface or from local?

Feb 15 '24 20:02 magdyksaleh

@magdyksaleh ,

the adapter is from huggingface

Feb 15 '24 21:02 avoskresensky

lorax lorax copied to clipboard

Server error: This model was initialized with the adapter xxx and therefore does not support dynamic adapter loading. Please initialize a new model instance from the base model in order to use the dynamic adapter loading feature.

System Info

Information

Tasks

Reproduction

Expected behavior

lorax
lorax copied to clipboard