Can you provide a exampling using only navie agent?
Can you provide a agent example using naive python code without any frameworks like autogen / langchain?
I think it is very important for customized need.
It's already in the roadmap.
Hello, is there any progress about providing a naive python code (loading local model) agentic rl example? Looking forward to this example so much!
@flatLying I think you are talking about a different thing. Do you mean an agent without chat.completion and calling HuggingFace transformer generate function directly?
I mean an complete simple example without any framework support,just like how TRL document show: https://huggingface.co/docs/trl/main/grpo_trainer
# train_grpo.py
from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer
dataset = load_dataset("trl-lib/ultrafeedback-prompt", split="train")
# Dummy reward function for demonstration purposes
def reward_num_unique_letters(completions, **kwargs):
"""Reward function that rewards completions with more unique letters."""
completion_contents = [completion[0]["content"] for completion in completions]
return [float(len(set(content))) for content in completion_contents]
training_args = GRPOConfig(output_dir="Qwen2-0.5B-GRPO")
trainer = GRPOTrainer(
model="Qwen/Qwen2-0.5B-Instruct",
reward_funcs=reward_num_unique_letters,
args=training_args,
train_dataset=dataset,
)
trainer.train()
With a just py file, it's easier for startup. Thanks a lot ! The TRL document is quite easy to understand, but our document is a bit more complicated ~~ An example will help a lot~~