litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

LitGPT chat terminates weirdly

Open rasbt opened this issue 10 months ago • 8 comments

Our litgpt chat command is doing something weird where it exits the shell and executes the code on the terminal.

To reproduce:

litgpt chat   --checkpoint_dir out/custom-phi-2/final

and then paste the following into the chat window:

explain this code: # 1) Download a pretrained model
litgpt download --repo_id microsoft/phi-2

# 2) Pretrain the model
litgpt pretrain \
  --initial_checkpoint_dir checkpoints/microsoft/phi-2 \
  --data Alpaca2k \
  --out_dir out/custom-phi-2

# 3) Chat with the model
litgpt chat \
  --checkpoint_dir out/custom-phi-2/final

This will first work correctly but then exit the session and run the code on the shell, e.g. search for >> Prompt: >> Reply in the output below:

⚡ codegemma ~/litgpt litgpt chat --checkpoint_dir checkpoints/google/codegemma-7b-it
Now chatting with CodeGemma-7b-it.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: explain this code: # 1) Download a pretrained model
litgpt download --repo_id microsoft/phi-2

# 2) Pretrain the model
litgpt pretrain \
  --initial_checkpoint_dir checkpoints/microsoft/phi-2 \
  --data Alpaca2k \
  --out_dir out/custom-phi-2

# 3) Chat with the model
litgpt chat \
  --checkpoint_dir out/custom-phi-2/final


>> Reply: The code snippet you provided attempts to download a pre-trained model from a given URL. Here's a step-by-step explanation:

1. **Import the necessary library:** The code starts by importing the `requests` library, which is needed to make HTTP requests to the URL.

2. **Define the URL:** The URL where the pre-trained model is located is defined as a string.

3. **Download the model:** The `requests.get()` method is used to send a GET request to the URL. This request retrieves the binary data of the pre-trained model.

4. **Save the model:** The `with open()` statement is used to open a file named `model.ckpt` in write binary mode (`wb`). The retrieved model data is then written to this file using the file object's `write()` method.

5. **Close the file:** The file object is closed using the `close()` method to save the changes made to the file.

In summary, this code downloads a pre-trained model from the specified URL and saves it locally as a file named `model.ckpt`. This can be useful for using the model in further applications or training additional models.
Time for inference: 13.50 sec total, 18.52 tokens/sec, 250 tokens

>> Prompt: >> Reply: I can't download the LITGPT model because it is not publicly available. It is a private repository on the Hugging Face model hub. For security reasons, I can't access private repositories without the owner's permission.
Time for inference: 2.47 sec total, 19.00 tokens/sec, 47 tokens

>> Prompt: %                                                                                                                    
⚡ codegemma ~/litgpt # 2) Pretrain the model
⚡ codegemma ~/litgpt litgpt pretrain \
>   --initial_checkpoint_dir checkpoints/microsoft/phi-2 \
>   --data Alpaca2k \
>   --out_dir out/custom-phi-2
Traceback (most recent call last):
  File "/home/zeus/miniconda3/envs/cloudspace/bin/litgpt", line 8, in <module>
    sys.exit(main())
  File "/teamspace/studios/this_studio/litgpt/litgpt/__main__.py", line 131, in main
    fn(**kwargs)
  File "/teamspace/studios/this_studio/litgpt/litgpt/pretrain.py", line 96, in setup
    raise ValueError(f"Please specify --model_name <model_name>. Available values:\n{available_models}")
ValueError: Please specify --model_name <model_name>. Available values:
Camel-Platypus2-13B
Camel-Platypus2-70B
CodeGemma-7b-it
CodeLlama-13b-Instruct-hf
CodeLlama-13b-Python-hf
CodeLlama-13b-hf
CodeLlama-34b-Instruct-hf
CodeLlama-34b-Python-hf
CodeLlama-34b-hf
CodeLlama-70b-Instruct-hf
CodeLlama-70b-Python-hf
CodeLlama-70b-hf
CodeLlama-7b-Instruct-hf
CodeLlama-7b-Python-hf
CodeLlama-7b-hf
FreeWilly2
Gemma-2b
Gemma-2b-it
Gemma-7b
Gemma-7b-it
LLaMA-2-7B-32K
Llama-2-13b-chat-hf
Llama-2-13b-hf
Llama-2-70b-chat-hf
Llama-2-70b-hf
Llama-2-7b-chat-hf
Llama-2-7b-chat-hf-function-calling-v2
Llama-2-7b-hf
Mistral-7B-Instruct-v0.1
Mistral-7B-Instruct-v0.2
Mistral-7B-v0.1
Mistral-7B-v0.2
Mixtral-8x7B-Instruct-v0.1
Mixtral-8x7B-v0.1
Nous-Hermes-13b
Nous-Hermes-Llama2-13b
Nous-Hermes-llama-2-7b
Platypus-30B
Platypus2-13B
Platypus2-70B
Platypus2-70B-instruct
Platypus2-7B
RedPajama-INCITE-7B-Base
RedPajama-INCITE-7B-Chat
RedPajama-INCITE-7B-Instruct
RedPajama-INCITE-Base-3B-v1
RedPajama-INCITE-Base-7B-v0.1
RedPajama-INCITE-Chat-3B-v1
RedPajama-INCITE-Chat-7B-v0.1
RedPajama-INCITE-Instruct-3B-v1
RedPajama-INCITE-Instruct-7B-v0.1
Stable-Platypus2-13B
dolly-v2-12b
dolly-v2-3b
dolly-v2-7b
falcon-180B
falcon-180B-chat
falcon-40b
falcon-40b-instruct
falcon-7b
falcon-7b-instruct
longchat-13b-16k
longchat-7b-16k
open_llama_13b
open_llama_3b
open_llama_7b
phi-1_5
phi-2
pythia-1.4b
pythia-1.4b-deduped
pythia-12b
pythia-12b-deduped
pythia-14m
pythia-160m
pythia-160m-deduped
pythia-1b
pythia-1b-deduped
pythia-2.8b
pythia-2.8b-deduped
pythia-31m
pythia-410m
pythia-410m-deduped
pythia-6.9b
pythia-6.9b-deduped
pythia-70m
pythia-70m-deduped
stable-code-3b
stablecode-completion-alpha-3b
stablecode-completion-alpha-3b-4k
stablecode-instruct-alpha-3b
stablelm-3b-4e1t
stablelm-base-alpha-3b
stablelm-base-alpha-7b
stablelm-tuned-alpha-3b
stablelm-tuned-alpha-7b
stablelm-zephyr-3b
tiny-llama-1.1b
tiny-llama-1.1b-chat
vicuna-13b-v1.3
vicuna-13b-v1.5
vicuna-13b-v1.5-16k
vicuna-33b-v1.3
vicuna-7b-v1.3
vicuna-7b-v1.5
vicuna-7b-v1.5-16k
⚡ codegemma ~/litgpt
⚡ codegemma ~/litgpt # 3) Chat with the model
⚡ codegemma ~/litgpt litgpt chat \
>   --checkpoint_dir out/custom-phi-2/final
--checkpoint_dir '/teamspace/studios/this_studio/litgpt/out/custom-phi-2/final' is not a checkpoint directory.
Find download instructions at https://github.com/Lightning-AI/litgpt/blob/main/tutorials

You have downloaded locally:
 --checkpoint_dir '/teamspace/studios/this_studio/litgpt/checkpoints/google/codegemma-7b-it'

See all download options by running:

rasbt avatar Apr 11 '24 18:04 rasbt

I recommend to validate it on a local machine. Terminal in a Studio might behave weirdly.

Andrei-Aksionov avatar Apr 11 '24 19:04 Andrei-Aksionov

Good point, this is probably it. I'll test it on a GPU machine later

rasbt avatar Apr 11 '24 20:04 rasbt

Unfortunately, I have that issue in a local terminal too.

rasbt avatar Apr 11 '24 21:04 rasbt

The script is designed to stop when you pass an empty line: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/chat/base.py#L173-L174.

I suggest that you debug what input() gets when you copy paste a piece of text like that with empty newlines. It might split it into pieces.

carmocca avatar Apr 12 '24 00:04 carmocca

Hm yeah this is indeed what's happening. There's currently no way around it if we want to use input(). Even if we fixed that by e.g., not quitting on empty lines the problem would still be that the model would process each line independently I think.

We probably need to switch to sys.stdin.read() or so.

rasbt avatar Apr 12 '24 01:04 rasbt

Maybe the easiest is simply to remove the functionality to terminate on an empty line. This should also prevent accidents if you copy paste a litgpt chat command + a new line, which would lead to immediate termination. Asking the user to type exit just like in the Python shell is IMO very acceptable.

awaelchli avatar Apr 12 '24 23:04 awaelchli

I agree. That's what I had first but then I thought you guys wouldn't be happy to remove it entirely ... but I agree, it's the simplest solution.

rasbt avatar Apr 13 '24 00:04 rasbt

Is that enough? I thought this would also need a way to let it know that you finished typing to support newlines

carmocca avatar Apr 13 '24 07:04 carmocca