MotionGPT
MotionGPT copied to clipboard
Demo error: invalid literal for int() with base 10:
Hi all,
Thanks for your work. I'm encountering an issue when I attempt to run the demo:
-
ValueError: At line 114:
tokens = torch.tensor([int(token) for token in output.split(',')]).cuda()
, I encounter the error message
ValueError: invalid literal for int() with base 10: 'однакоMediaalling proceeduestopower dig impl courFincija heraus са XIIoshiicznênternoon Jimmy soap Weit rueadel KomSERT моло official comingyy SloLayoutInflaterstate domains waronds
анти alcune(\'. a
To filter out non-integer values, I tried modifying the line to
tokens = torch.tensor([int(token) for token in output.split(',') if token.isdigit()], dtype=torch.long).cuda()
but then encountered the Runtime error:
Calculated padded input size per channel: (2). Kernel size: (3).
Kernel size can't be greater than actual input size
I followed installation instructions as listed. Any suggestions? Thanks!
@qiqiApink
I'm getting the same error: "ValueError: invalid literal for int() with base 10" when executing the demo for text-to-motion. There seems to be no valid integers in my output. I'm using pretrained-7B.pth.
@felipe-parodi could you solve it?
Still haven't solved it. I tried fine-tuning LLaMA on the kit dataset, and that didn't solve it either.
Thanks for the quick response
try: output = re.findall(r'\d+', output) for j, num in enumerate(output): if int(num) > 511: output = output[:j] break if len(output) == 0: tokens = torch.ones(1, max_new_tokens).cuda().long() else: tokens = torch.tensor([[int(num) for num in output]]).cuda().long() except: tokens = torch.ones(1, max_new_tokens).cuda().long() put this in generate_motion.py.
that doesn't solve the issue though, that simply creates a tensor of ones if it doesn't get the desired output.
This might be your model training is broken. You can try the pretrained 7B model that the author updated. It can generate the desire output .
Hi @SHUWEI-HO, thanks for your quick response. The same happens for me with the pretrained 7B model as well as the finetuned model. Could you share your generate_motion script? it's unclear why it isn't working on my end.
import os
import sys
import time
import warnings
from pathlib import Path
from typing import Optional
import lightning as L
import torch
import numpy as np
import models.vqvae as vqvae
from generate import generate
from lit_llama import Tokenizer, LLaMA, LLaMAConfig
from lit_llama.lora import lora
from lit_llama.utils import EmptyInitOnDevice, lazy_load
from scripts.prepare_motion import generate_prompt
from options import option
import imageio
from utils.evaluate import plot
from visualization.render import render
warnings.filterwarnings('ignore')
args = option.get_args_parser()
def main(
quantize: Optional[str] = None,
dtype: str = "float32",
max_new_tokens: int = 200,
top_k: int = 200,
temperature: float = 0.8,
accelerator: str = "auto",
) -> None:
lora_path = Path(args.lora_path)
pretrained_path = Path(f"./checkpoints/lit-llama/{args.pretrained_llama}/lit-llama.pth")
tokenizer_path = Path("./checkpoints/lit-llama/tokenizer.model")
assert lora_path.is_file()
assert pretrained_path.is_file()
assert tokenizer_path.is_file()
if quantize is not None:
raise NotImplementedError("Quantization in LoRA is not supported yet")
fabric = L.Fabric(accelerator=accelerator, devices=1)
dt = getattr(torch, dtype, None)
if not isinstance(dt, torch.dtype):
raise ValueError(f"{dtype} is not a valid dtype.")
dtype = dt
net = vqvae.HumanVQVAE(args, ## use args to define different parameters in different quantizers
args.nb_code,
args.code_dim,
args.output_emb_width,
args.down_t,
args.stride_t,
args.width,
args.depth,
args.dilation_growth_rate)
print ('loading checkpoint from {}'.format(args.vqvae_pth))
ckpt = torch.load(args.vqvae_pth, map_location='cpu')
net.load_state_dict(ckpt['net'], strict=True)
net.eval()
net.cuda()
print("Loading model ...", file=sys.stderr)
t0 = time.time()
# with EmptyInitOnDevice(
# device=fabric.device, dtype=dtype, quantization_mode=quantize
# ), lora(r=args.lora_r, alpha=args.lora_alpha, dropout=args.lora_dropout, enabled=True):
with fabric.device, lora(r=args.lora_r, alpha=args.lora_alpha, dropout=args.lora_dropout, enabled=True):
config = LLaMAConfig.from_name(args.pretrained_llama)
torch.set_default_tensor_type(torch.HalfTensor)
model = LLaMA(config).bfloat16()
torch.set_default_tensor_type(torch.FloatTensor)
# model = LLaMA(LLaMAConfig()) # TODO: Support different model sizes
print(f"Time to load model: {time.time() - t0:.02f} seconds.", file=sys.stderr)
model.eval()
model = fabric.setup_module(model)
tokenizer = Tokenizer(tokenizer_path)
sample = {"instruction": args.prompt, "input": args.input}
prompt = generate_prompt(sample)
encoded = tokenizer.encode(prompt, bos=True, eos=False, device=model.device)
t0 = time.perf_counter()
output = generate(
model,
idx=encoded,
max_seq_length=max_new_tokens,
max_new_tokens=max_new_tokens,
temperature=temperature,
top_k=top_k,
eos_id=tokenizer.eos_id
)
output = tokenizer.decode(output)
output = output.split("### Response:")[1].strip()
t = time.perf_counter() - t0
print(f"\n\nTime for inference: {t:.02f} sec total, {max_new_tokens / t:.02f} tokens/sec", file=sys.stderr)
print(f"Memory used: {torch.cuda.max_memory_reserved() / 1e9:.02f} GB", file=sys.stderr)
try:
output = re.findall(r'\d+', output)
for j, num in enumerate(output):
if int(num) > 511:
output = output[:j]
break
if len(output) == 0:
tokens = torch.ones(1, max_new_tokens).cuda().long()
else:
tokens = torch.tensor([[int(num) for num in output]]).cuda().long()
except:
tokens = torch.ones(1, max_new_tokens).cuda().long()
generated_pose, img = plot(tokens, net, args.dataname)
n = str(input("Enter the name :"))
gif_name = f"round{n}.gif"
os.makedirs(args.out_dir, exist_ok=True)
if gif_name in os.listdir():
print("Exist! Enter Again!")
n = str(input("Enter the name :"))
gif_name = f"{n}.gif"
np.save(os.path.join(args.out_dir, f'{n}.npy'), generated_pose)
else :
gif_name = f"{n}.gif"
np.save(os.path.join(args.out_dir, f'{n}.npy'), generated_pose)
imageio.mimsave(os.path.join(args.out_dir, gif_name), np.array(img), fps=20)
if args.render:
print("Rendering...")
render(generated_pose,n, outdir=args.out_dir)
Thanks. Why did you remove the original decoding line 106
output = tokenizer.decode(output)
just before output = output.split("### Response:")[1].strip()
?
Sorry, I missed the code "output = tokenizer.decode(output)" up . I edited my previous reply .
I always get tensor of ones with the pretrained 7B model.
@SHUWEI-HO in your code you have args = option.get_argsdi_parser(), while in the current generate_motion.py is args = option.get_args_parser()
Maybe you have different arguments configured.
Hi @felipe-parodi I fine-tuned 7B llama with KIT and encountered same error. Tried with different regex parsing, but still unable to solve the issue. Did you find a workaround by any chance? thanks
--
update: @felipe-parodi @qiqiApink
I found this issue will be solved with training the model for longer, i.e 35000+ iterations. Then it manages to output the required non-numerical tokens. And the line: tokens = torch.tensor([int(token) for token in output.split(',')]).cuda()
will receive the format needed to convert tokens into int
tokens can also be loaded like this : tokens = torch.tensor([int(token) for token in output.strip(',').split(',')]).cuda()
to remove any unnecessay blank spaces or split issues in the output
Hope this helps
The weights of llama downloaded from pyllama maybe change. So the fine-tuned weights provided will not match with llama you downloaded. You can fine-tune the model by yourselves.