PraisonAI icon indicating copy to clipboard operation
PraisonAI copied to clipboard

Add Gemma2B_Instruction_Agent to Cookbooks

Open Dhivya-Bharathy opened this issue 7 months ago • 4 comments

User description

This notebook sets up a fine-tuning pipeline for the google/gemma-2-2b-it model using a small in-memory dataset. It includes tokenization, training with transformers.Trainer, inference, and model saving. The setup is lightweight and avoids external dataset loading issues for quick testing and experimentation.


PR Type

Documentation, Enhancement


Description

  • Add new example notebooks for AI agent workflows and LLM chat.

    • Predictive maintenance workflow with multi-agent orchestration.
    • Code analysis agent for automated codebase evaluation.
    • Qwen2.5-0.5B-Instruct chat demo with Hugging Face Transformers.
    • (Also adds Gemma2B Instruction Agent notebook, not shown in diff.)
  • Each notebook includes step-by-step code, markdown explanations, and sample outputs.

  • Demonstrates practical usage of PraisonAIAgents and LLMs for real-world tasks.


Changes walkthrough 📝

Relevant files
Documentation
Code_Analysis_Agent.ipynb
Add code analysis agent example notebook                                 

examples/cookbooks/Code_Analysis_Agent.ipynb

  • Introduces a notebook for building a code analysis agent.
  • Shows setup of agent/task with PraisonAIAgents and Pydantic schemas.
  • Demonstrates code ingestion, analysis, and structured reporting.
  • Provides example output and markdown explanations.
  • +459/-0 
    Predictive_Maintenance_Multi_Agent_Workflow.ipynb
    Add predictive maintenance multi-agent workflow notebook 

    examples/cookbooks/Predictive_Maintenance_Multi_Agent_Workflow.ipynb

  • Adds a notebook for predictive maintenance using multiple agents.
  • Defines helper functions, agents, and tasks for workflow automation.
  • Demonstrates async workflow execution and output interpretation.
  • Includes markdown explanations and sample results.
  • +401/-0 
    Qwen2_5_InstructionAgent.ipynb
    Add Qwen2.5 Instruction Agent chat demo notebook                 

    examples/cookbooks/Qwen2_5_InstructionAgent.ipynb

  • Provides a beginner-friendly notebook for Qwen2.5-0.5B-Instruct chat.
  • Covers dependency installation, authentication, and model inference.
  • Walks through prompt creation, response generation, and output
    display.
  • Includes markdown cells for context and Colab integration.
  • +420/-0 
    Additional files
    Gemma2B_Instruction_Agent.ipynb +4713/-0

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Summary by CodeRabbit

    • New Features
      • Added a "Code Analysis Agent" notebook demonstrating AI-driven code quality assessment and structured reporting.
      • Introduced a "Predictive Maintenance Multi-Agent Workflow" notebook showcasing a multi-agent AI workflow for sensor data analysis and maintenance scheduling.
      • Added a "Qwen2.5 InstructionAgent" notebook illustrating chat interactions with the Qwen2.5 language model using Hugging Face Transformers.
      • Added a "Gemma 2B Instruction Agent" notebook demonstrating model loading, training preparation, inference, and saving using the Gemma 2B causal language model.

    Dhivya-Bharathy avatar Jun 05 '25 10:06 Dhivya-Bharathy

    [!WARNING]

    Rate limit exceeded

    @DhivyaBharathy-web has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 40 seconds before requesting another review.

    ⌛ How to resolve this issue?

    After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

    We recommend that you space out your commits to avoid hitting the rate limit.

    🚦 How do rate limits work?

    CodeRabbit enforces hourly rate limits for each developer per organization.

    Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

    Please see our FAQ for further information.

    📥 Commits

    Reviewing files that changed from the base of the PR and between 078732cc96f1ee7fff1235b585b5d80f19fa82af and 39e03a485facae9aa970dcb13fad352aee281168.

    📒 Files selected for processing (1)
    • examples/cookbooks/Code_Analysis_Agent.ipynb (2 hunks)

    Walkthrough

    Four new Jupyter notebook examples are introduced: one for AI-driven code analysis with structured reporting, one demonstrating a multi-agent predictive maintenance workflow, one showcasing chat interaction with the Qwen2.5 instruction model, and one illustrating training and inference with the Gemma 2B instruction agent. Each notebook includes environment setup, agent/task or model definitions, execution, and output display.

    Changes

    File(s) Change Summary
    examples/cookbooks/Code_Analysis_Agent.ipynb Added notebook demonstrating AI-based code analysis with Pydantic models, agent/task setup, code ingestion via GitIngest, and structured output.
    examples/cookbooks/Predictive_Maintenance_Multi_Agent_Workflow.ipynb Added notebook showing a predictive maintenance workflow using multiple agents, helper functions, tasks, asynchronous execution, and output.
    examples/cookbooks/Qwen2_5_InstructionAgent.ipynb Added notebook for simple chat interaction with Qwen2.5-0.5B-Instruct model using Hugging Face Transformers and token authentication.
    examples/cookbooks/Gemma2B_Instruction_Agent.ipynb Added notebook demonstrating data preparation, training, inference, and saving of Gemma 2B causal LM with Hugging Face Transformers and datasets.

    Sequence Diagram(s)

    sequenceDiagram
        participant User
        participant Notebook
        participant PraisonAIAgents
        participant Agent
        participant Task
        participant GitIngest
    
        User->>Notebook: Provide code source (path or GitHub URL)
        Notebook->>GitIngest: Ingest repository content
        GitIngest-->>Notebook: Return repo summary, structure, code
        Notebook->>PraisonAIAgents: Run analysis with Agent and Task
        PraisonAIAgents->>Agent: Analyze code context
        Agent-->>Task: Generate analysis report
        Task-->>PraisonAIAgents: Return structured report
        PraisonAIAgents-->>Notebook: Return CodeAnalysisReport
        Notebook-->>User: Display analysis results
    
    sequenceDiagram
        participant User
        participant Notebook
        participant PraisonAIAgents
        participant SensorMonitor
        participant PerformanceAnalyzer
        participant AnomalyDetector
        participant FailurePredictor
        participant MaintenanceScheduler
    
        User->>Notebook: Start predictive maintenance workflow
        Notebook->>PraisonAIAgents: Initiate workflow
        PraisonAIAgents->>SensorMonitor: Collect sensor data
        SensorMonitor-->>PraisonAIAgents: Return sensor data
        PraisonAIAgents->>PerformanceAnalyzer: Analyze performance
        PerformanceAnalyzer-->>PraisonAIAgents: Return analysis
        PraisonAIAgents->>AnomalyDetector: Detect anomalies
        AnomalyDetector-->>PraisonAIAgents: Return anomalies
        PraisonAIAgents->>FailurePredictor: Predict failures
        FailurePredictor-->>PraisonAIAgents: Return predictions
        PraisonAIAgents->>MaintenanceScheduler: Schedule maintenance
        MaintenanceScheduler-->>PraisonAIAgents: Return schedule
        PraisonAIAgents-->>Notebook: Return workflow results
        Notebook-->>User: Display workflow output
    

    Possibly related PRs

    • MervinPraison/PraisonAI#600: Adds the same Code_Analysis_Agent.ipynb notebook, including the analyze_code function, Pydantic data models, and agent/task definitions for code analysis.

    Poem

    🐇 In notebooks bright, new agents rise,
    Code analysis under wise eyes.
    Predictive teams keep machines in tune,
    Qwen2 chats and Gemma’s boon.
    With hops and clicks, the rabbits cheer,
    AI’s magic drawing near!
    🌿✨


    Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

    ❤️ Share
    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Explain this complex logic.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai explain this code block.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and explain its main purpose.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Support

    Need help? Create a ticket on our support page for assistance with any issues or questions.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate docstrings to generate docstrings for this PR.
    • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    coderabbitai[bot] avatar Jun 05 '25 10:06 coderabbitai[bot]

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 Security concerns

    API key exposure:
    The notebooks contain code for entering API keys directly as plaintext (examples/cookbooks/Predictive_Maintenance_Multi_Agent_Workflow.ipynb line 66 and examples/cookbooks/Code_Analysis_Agent.ipynb line 67). This is a security risk as API keys could be accidentally committed to version control or shared. These should be loaded from environment variables, .env files (with proper gitignore), or using a secrets management system.

    ⚡ Recommended focus areas for review

    Hardcoded Token

    The notebook contains a hardcoded placeholder for Hugging Face token authentication that could be improved with better security practices or clearer instructions for users.

    "from huggingface_hub import login\n",
    "login(token=\"Enter your huggingface token\")\n"
    
    API Key Exposure

    The notebook includes code for directly entering an API key as plaintext, which is a security concern. Should use environment variables or secrets management.

    "import os\n",
    "os.environ['OPENAI_API_KEY'] = 'enter your api key'"
    

    qodo-code-review[bot] avatar Jun 05 '25 10:06 qodo-code-review[bot]

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Security
    Secure token handling

    The code currently requires users to manually replace the placeholder text with
    their Hugging Face token. This creates a security risk as users might
    accidentally commit their token to version control. Use environment variables or
    a secure token management approach instead.

    examples/cookbooks/Gemma2B_Instruction_Agent.ipynb [355]

    -login("Enter your token here")
    +import os
    +# Get token from environment variable or prompt user securely
    +token = os.environ.get("HF_TOKEN") or getpass.getpass("Enter your Hugging Face token: ")
    +login(token)
    

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 6

    __

    Why: Valid security concern about hardcoded token placeholders, but for a tutorial notebook, having clear placeholders that users replace is a common and acceptable pattern.

    Low
    Improve credential security

    The current code requires users to manually replace the placeholder text with
    their actual token. Instead, use a more secure approach that prompts for the
    token or uses environment variables. This avoids hardcoding sensitive
    credentials in the notebook.

    examples/cookbooks/Qwen2_5_InstructionAgent.ipynb [123-124]

     from huggingface_hub import login
    -login(token="Enter your huggingface token")
    +import os
     
    +# Get token from environment variable or prompt user
    +token = os.getenv("HF_TOKEN") or input("Enter your Hugging Face token: ")
    +login(token=token)
    +
    

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 6

    __

    Why: The suggestion correctly identifies a security best practice improvement by replacing hardcoded placeholder credentials with environment variables or user prompts. However, since this is an educational notebook with placeholder text that users must replace anyway, the security impact is moderate rather than critical.

    Low
    General
    Remove unused code
    Suggestion Impact:The commit completely removed the tokenized dataset code (lines 477-478) that was creating confusion for users. The entire notebook was significantly restructured, with the unused code being removed as part of that restructuring.

    code diff:

    -        "tokenized_dataset = dataset.map(tokenize_function)\n",
    -        "tokenized_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask'])"
    

    The tokenized dataset is prepared but never used in the notebook. This creates
    confusion for users following the tutorial and wastes computational resources.
    Either use the tokenized dataset for training or remove this code block.

    examples/cookbooks/Gemma2B_Instruction_Agent.ipynb [473-477]

     def tokenize_function(example):
         return tokenizer(example['text'], padding='max_length', truncation=True, max_length=64)
     
    -tokenized_dataset = dataset.map(tokenize_function)
    -tokenized_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask'])
    +# Only tokenize if we're going to use it for training
    +if PERFORM_TRAINING:
    +    tokenized_dataset = dataset.map(tokenize_function)
    +    tokenized_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask'])
    

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 4

    __

    Why: Correctly identifies that tokenized_dataset is created but never used, but the improved_code introduces undefined variable PERFORM_TRAINING making the solution incomplete.

    Low
    • [ ] Update

    qodo-code-review[bot] avatar Jun 05 '25 10:06 qodo-code-review[bot]

    Codecov Report

    All modified and coverable lines are covered by tests :white_check_mark:

    Project coverage is 16.43%. Comparing base (60fd485) to head (39e03a4). Report is 82 commits behind head on main.

    Additional details and impacted files
    @@           Coverage Diff           @@
    ##             main     #607   +/-   ##
    =======================================
      Coverage   16.43%   16.43%           
    =======================================
      Files          24       24           
      Lines        2160     2160           
      Branches      302      302           
    =======================================
      Hits          355      355           
      Misses       1789     1789           
      Partials       16       16           
    
    Flag Coverage Δ
    quick-validation 0.00% <ø> (ø)
    unit-tests 16.43% <ø> (ø)

    Flags with carried forward coverage won't be shown. Click here to find out more.

    :umbrella: View full report in Codecov by Sentry.
    :loudspeaker: Have feedback on the report? Share it here.

    :rocket: New features to boost your workflow:
    • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
    • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

    codecov[bot] avatar Jun 05 '25 11:06 codecov[bot]