mgeneratejs Add Async $text Operator for LLM-Integrated Data Generation with Ollama

Add Async $text Operator for LLM-Integrated Data Generation with Ollama

Open omkarkhair opened this issue 6 months ago • 0 comments

This PR introduces a new $text operator to Mongo's mgenerate tool, allowing integration with Large Language Models (LLMs) using Ollama API to generate contextually relevant text data based on user-defined prompts. This enhancement significantly improves the tool's capability to create more specific and meaningful dummy data, addressing various application use cases, such as:

Application-Specific Data: Generate tailored data for specific domains (e.g., healthcare job titles).
Long Text Generation: Produce coherent, context-appropriate long text (e.g., product reviews).
Regional Contextualization: Generate data with regional relevance (e.g., Indian names).

Key Changes:

Added a new $text operator in mgenerate with Ollama integration.
Integrated LLM model via Ollama
Converted mgenerate into an asynchronous library to support LLM integration.
Updated documentation to include usage examples and details for the new $text operator.

Example Usage:

{
    "name": "$name",
    "Role": {
        "$text": {
            "prompt": "Rare Designation or job title found in Healthcare",
            "maxWordCount": "4"
        }
    },
    "lastLogin": "$now"
}

Example Output (model: mistral-nemo):

{
    "name": "Virginia Blair",
    "Role": "Medical Assistant",
    "lastLogin": {
        "$date": "2024-07-28T12:53:00.267Z"
    }
}

Jul 28 '24 13:07 omkarkhair

mgeneratejs mgeneratejs copied to clipboard

Add Async $text Operator for LLM-Integrated Data Generation with Ollama

mgeneratejs
mgeneratejs copied to clipboard