lyzr DataAnalyzr improvements

Updates and changes

Refactoring code in the module
Get code generated by GPT for pythonic analysis and plotting
Improved logging - logging to csv files
Improved handling of input params - using data validation with pydantic
Using deterministic uuids
Introduced some inheritance in the classes to make the flow easier
Use RAG for pythonic analysis and plotting

How to test the code in the PR?

Clone this branch into your local lyzr repo.

cd lyzr
git checkout -b imp/data-analyzr
git pull --set-upstream origin imp/data-analyzr

Use the following python code to test any query:

import os
os.chdir("/path/to/lyzr/repo")

from lyzr.base import LyzrLLMFactory
from lyzr.data_analyzr import DataAnalyzr

da = DataAnalyzr(
    # required params
    analysis_type="ml", # or "sql", or "skip"
    # optional params
    api_key="sk-", # pass here or set in .env
    class_params={
        "max_retries": 5, # number of tries for each query
        "time_limit": 45, # time limit in seconds for each query
        "auto_train": True, # whether to update the RAG database automatically
    },
    log_params={
        "log_filename": "path/to/logfile.csv", # path to the log file
        "print_log": True, # whether to print logs to console
        "log_level": "INFO", # log level
    },
    generator_llm=LyzrLLMFactory.from_defaults(model="gpt-4-1106-preview"),
    analysis_llm=LyzrLLMFactory.from_defaults(model="gpt-3.5-turbo"),
    context="",
)
da.get_data(
    # required params
    db_type="files", # or "redshift", "postgres", "sqlite",
    data_config={
        "datasets": {
            "name_of_dataset": "path/to/dataset",
            "name_of_dataset": "path/to/dataset",
            "name_of_dataset": "path/to/dataset",
        },
    },
    # optional params
    vector_store_config={
        "path": "path/to/vector_store",
        "remake_store": True,
    },
)
result = da.ask(
    # required params
    user_input="Your question here",
    # optional params
    outputs=["visualisation", "insights", "recommendations", "tasks"],
    plot_path="path/to/save/plots",
    recommendations_params={
        "from_insights": True, # whether to use insights in recommendations' generation
        "output_type": "text", # or "json"
        "json_format": dict, # format for json output
    },
    counts={
        "insights": 3, # number of insights to generate
        "recommendations": 3, # number of recommendations to generate
        "tasks": 5, # number of tasks to generate
    },
    context={
        "analysis": "context for analysis",
        "visualisation": "context for visualisation",
        "insights": "context for insights",
        "recommendations": "context for recommendations",
        "tasks": "context for tasks",
    },
    # kwargs
    rerun_analysis=True, # whether to rerun analysis for the same query
    time_limit=45, # time limit in seconds for the query
    max_retries=3, # number of tries for the query
    auto_train=True, # whether to update the RAG database automatically
)
# show generated plot
from PIL import Image
Image.open(result["visualisation"])
# print insights
print(result["insights"])
# print recommendations
print(result["recommendations"])
# print tasks
print(result["tasks"])

What type of PR is this?

[x] :gift: New feature (non-breaking change which adds functionality)
[x] :bug: Bug fix (non-breaking change which fixes an issue)
[x] :bomb: Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] :memo: Documentation update
[x] :art: Refactor or style update
[x] :fire: Performance improvements
[ ] :white_check_mark: Test
[ ] :robot: Build
[ ] :repeat: CI, review, release, devops, chore, etc.

Checklist:

[x] :sunglasses: My code follows the style guidelines of this project.
[x] :ballot_box_with_check: I have performed a self-review of my code.
[ ] :bookmark_tabs: I have commented my code, particularly in hard-to-understand areas.
[ ] :bookmark: I have made corresponding changes to the documentation.
[x] :warning: My changes generate no new warnings.
[ ] :monocle_face: I have added tests that prove my fix is effective or that my feature works.
[ ] :white_check_mark: New and existing unit tests pass locally with my changes.
[x] :link: Any dependent changes have been merged and published in downstream modules.

May 06 '24 08:05 gargimaheshwari

Todo:

Testing
Build

May 07 '24 07:05 gargimaheshwari

Testing done. Build TBD.

May 14 '24 11:05 gargimaheshwari

Build done.

May 23 '24 14:05 gargimaheshwari

lyzr
lyzr copied to clipboard

DataAnalyzr improvements - big PR

Updates and changes

How to test the code in the PR?

What type of PR is this?

Checklist:

lyzr lyzr copied to clipboard

DataAnalyzr improvements - big PR

Updates and changes

How to test the code in the PR?

What type of PR is this?

Checklist:

lyzr
lyzr copied to clipboard