mgeneratejs icon indicating copy to clipboard operation
mgeneratejs copied to clipboard

Add Async $text Operator for LLM-Integrated Data Generation with Ollama

Open omkarkhair opened this issue 6 months ago • 0 comments

This PR introduces a new $text operator to Mongo's mgenerate tool, allowing integration with Large Language Models (LLMs) using Ollama API to generate contextually relevant text data based on user-defined prompts. This enhancement significantly improves the tool's capability to create more specific and meaningful dummy data, addressing various application use cases, such as:

  • Application-Specific Data: Generate tailored data for specific domains (e.g., healthcare job titles).
  • Long Text Generation: Produce coherent, context-appropriate long text (e.g., product reviews).
  • Regional Contextualization: Generate data with regional relevance (e.g., Indian names).

Key Changes:

  • Added a new $text operator in mgenerate with Ollama integration.
  • Integrated LLM model via Ollama
  • Converted mgenerate into an asynchronous library to support LLM integration.
  • Updated documentation to include usage examples and details for the new $text operator.

Example Usage:

{
    "name": "$name",
    "Role": {
        "$text": {
            "prompt": "Rare Designation or job title found in Healthcare",
            "maxWordCount": "4"
        }
    },
    "lastLogin": "$now"
}

Example Output (model: mistral-nemo):

{
    "name": "Virginia Blair",
    "Role": "Medical Assistant",
    "lastLogin": {
        "$date": "2024-07-28T12:53:00.267Z"
    }
}

omkarkhair avatar Jul 28 '24 13:07 omkarkhair