PraisonAI icon indicating copy to clipboard operation
PraisonAI copied to clipboard

Question on multimodel agent.

Open csv610 opened this issue 1 year ago • 4 comments

Hello,

How do we feed camera in multi-model Agent?

Thanks

csv610 avatar Dec 30 '24 10:12 csv610

@claude review this request

MervinPraison avatar May 28 '25 14:05 MervinPraison

Claude finished @MervinPraison's task —— View job


Camera Input for Multi-Model Agents

@csv610 Great question! Here's how to feed camera input to multimodal agents in PraisonAI:

🎯 Current Approach

PraisonAI handles visual input through the images parameter in Tasks. While there's no built-in camera capture, you can integrate camera feeds by:

1. Direct File Path Method

from praisonaiagents import Agent, Task, PraisonAIAgents

# Create Vision Agent
vision_agent = Agent(
    name="CameraAnalyst",
    role="Camera Feed Analyzer",
    goal="Analyze camera captures in real-time",
    backstory="Expert in real-time visual analysis",
    llm="gpt-4o-mini"
)

# Task with captured image
task = Task(
    name="analyze_camera_feed",
    description="Analyze what you see in this camera capture",
    expected_output="Detailed analysis of objects, people, and activities",
    agent=vision_agent,
    images=["camera_capture.jpg"]  # Your saved camera frame
)

agents = PraisonAIAgents(
    agents=[vision_agent],
    tasks=[task],
    process="sequential"
)

result = agents.start()

2. Camera Integration Pattern

import cv2
from praisonaiagents import Agent, Task, PraisonAIAgents

def capture_and_analyze():
    # Capture from camera
    cap = cv2.VideoCapture(0)  # 0 for default camera
    ret, frame = cap.read()
    
    if ret:
        # Save frame
        cv2.imwrite("temp_capture.jpg", frame)
        cap.release()
        
        # Create agent task
        vision_agent = Agent(
            name="LiveAnalyst",
            role="Live Camera Analyst", 
            goal="Analyze live camera feeds",
            backstory="Expert in real-time visual processing",
            llm="gpt-4o-mini"
        )
        
        task = Task(
            description="What do you see in this camera feed? Identify objects, people, and activities.",
            expected_output="Real-time analysis of camera content",
            agent=vision_agent,
            images=["temp_capture.jpg"]
        )
        
        agents = PraisonAIAgents(
            agents=[vision_agent],
            tasks=[task]
        )
        
        return agents.start()

# Run analysis
result = capture_and_analyze()

3. Continuous Monitoring Setup

import cv2
import time
from praisonaiagents import Agent, Task, PraisonAIAgents

def continuous_camera_analysis(interval=5):
    """Analyze camera feed every 'interval' seconds"""
    
    vision_agent = Agent(
        name="SecurityMonitor",
        role="Security Camera Analyst",
        goal="Monitor camera feeds for activities",
        backstory="Expert security analyst with computer vision skills",
        llm="gpt-4o-mini"
    )
    
    cap = cv2.VideoCapture(0)
    
    while True:
        ret, frame = cap.read()
        if ret:
            timestamp = int(time.time())
            filename = f"capture_{timestamp}.jpg"
            cv2.imwrite(filename, frame)
            
            task = Task(
                description="Monitor this camera feed for any unusual activities, people, or objects. Report anything noteworthy.",
                expected_output="Security analysis report",
                agent=vision_agent,
                images=[filename]
            )
            
            agents = PraisonAIAgents(
                agents=[vision_agent],
                tasks=[task]
            )
            
            result = agents.start()
            print(f"Analysis at {timestamp}: {result}")
            
        time.sleep(interval)
    
    cap.release()

# Run continuous monitoring
# continuous_camera_analysis(interval=10)

📋 Supported Input Types

  • Local images: "camera_shot.jpg", "webcam_capture.png"
  • Local videos: "security_feed.mp4", "recording.avi"
  • Image URLs: "https://example.com/live_feed.jpg"
  • Multiple sources: ["cam1.jpg", "cam2.jpg", "video.mp4"]

🛠️ Setup Requirements

pip install praisonaiagents opencv-python
export OPENAI_API_KEY=your_key_here

💡 Pro Tips

  1. Real-time Processing: Capture frames periodically and process them as separate tasks
  2. Batch Analysis: Process multiple camera angles simultaneously
  3. Video Analysis: Save video segments and analyze them via the images parameter
  4. Memory Integration: Use agent memory to track changes over time

The key insight is that PraisonAI expects file paths in the images parameter, so you'll need to capture camera frames to disk first, then reference those files in your tasks.


claude[bot] avatar May 28 '25 14:05 claude[bot]

@claude create this as an example in example folder also include this in the docs in the appropriate location to be included in mint.json

MervinPraison avatar May 28 '25 16:05 MervinPraison

Claude finished @MervinPraison's task —— View jobclaude/issue-238-20250528_161801Create PR ➔


✅ Camera Integration Examples and Documentation Created!

I've successfully created comprehensive camera integration examples and documentation as requested:

📁 Created Examples:

  • examples/python/camera/camera-basic.py - Single frame capture and analysis
  • examples/python/camera/camera-continuous.py - Continuous monitoring system
  • examples/python/camera/camera-multi-agent.py - Multi-agent parallel analysis
  • examples/python/camera/camera-video-analysis.py - Video recording and temporal analysis
  • examples/python/camera/README.md - Complete usage documentation

📚 Documentation:

  • docs/features/camera-integration.mdx - Comprehensive integration guide with:
    • Quick start and setup instructions
    • Multiple integration patterns (single frame, continuous, multi-agent, video)
    • Configuration options and best practices
    • Use cases (security, retail, industrial, smart home)
    • Troubleshooting guide and security considerations
  • Updated docs/mint.json - Added camera integration to navigation menu

🎯 Key Features:

  • 4 different integration patterns covering various use cases
  • Real-time camera capture using OpenCV
  • Multi-agent analysis for comprehensive insights
  • Video recording and analysis for temporal events
  • Security monitoring capabilities
  • Automatic cleanup of temporary files
  • Error handling and troubleshooting guidance

🛠️ Usage:

# Install requirements
pip install praisonaiagents opencv-python
export OPENAI_API_KEY=your_key

# Run examples
python examples/python/camera/camera-basic.py
python examples/python/camera/camera-continuous.py

The documentation is now available in the Features section of the docs and provides complete guidance for integrating camera feeds with PraisonAI multimodal agents!


claude[bot] avatar May 28 '25 16:05 claude[bot]