starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

Demo snippet pulls all checkpoints

Open dhingratul opened this issue 2 years ago • 2 comments
trafficstars

tokenizer = AutoTokenizer.from_pretrained(checkpoint) as defined here - https://github.com/bigcode-project/starcoder#code-generation pulls 7 checkpoint files, ~9GB each. Is this the intended behavior ?

dhingratul avatar Aug 07 '23 23:08 dhingratul

Hi. Can you precise the checkpoint you are talking about? If you refer to starcoder, loading the tokenizer should not load any checkpoint file. In fact, this code snippet

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder")

takes a couple of seconds to run. Are you talking about the line model = AutoModelForCausalLM.from_pretrained(checkpoint) instead? If it is the case, yes it should load about 7 "shards", you can have a look at this.

ArmelRandy avatar Aug 09 '23 08:08 ArmelRandy

This is the example I am running:

#### Code Generation example
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# to save memory consider using fp16 or bf16 by specifying torch_dtype=torch.float16 for example
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
# clean_up_tokenization_spaces=False prevents a tokenizer edge case which can result in spaces being removed around punctuation
print(tokenizer.decode(outputs[0], clean_up_tokenization_spaces=False))

dhingratul avatar Aug 09 '23 16:08 dhingratul