agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

Can you provide a exampling using only navie agent?

Open hzy312 opened this issue 4 months ago • 4 comments

Can you provide a agent example using naive python code without any frameworks like autogen / langchain?

I think it is very important for customized need.

hzy312 avatar Aug 19 '25 07:08 hzy312

It's already in the roadmap.

ultmaster avatar Aug 19 '25 13:08 ultmaster

Hello, is there any progress about providing a naive python code (loading local model) agentic rl example? Looking forward to this example so much!

flatLying avatar Nov 07 '25 15:11 flatLying

@flatLying I think you are talking about a different thing. Do you mean an agent without chat.completion and calling HuggingFace transformer generate function directly?

ultmaster avatar Nov 10 '25 00:11 ultmaster

I mean an complete simple example without any framework support,just like how TRL document show: https://huggingface.co/docs/trl/main/grpo_trainer

# train_grpo.py
from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer

dataset = load_dataset("trl-lib/ultrafeedback-prompt", split="train")

# Dummy reward function for demonstration purposes
def reward_num_unique_letters(completions, **kwargs):
    """Reward function that rewards completions with more unique letters."""
    completion_contents = [completion[0]["content"] for completion in completions]
    return [float(len(set(content))) for content in completion_contents]

training_args = GRPOConfig(output_dir="Qwen2-0.5B-GRPO")
trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=reward_num_unique_letters,
    args=training_args,
    train_dataset=dataset,
)
trainer.train()

With a just py file, it's easier for startup. Thanks a lot ! The TRL document is quite easy to understand, but our document is a bit more complicated ~~ An example will help a lot~~

flatLying avatar Nov 10 '25 06:11 flatLying