lyzr icon indicating copy to clipboard operation
lyzr copied to clipboard

DataAnalyzr improvements - big PR

Open gargimaheshwari opened this issue 9 months ago • 3 comments

Updates and changes

  1. Refactoring code in the module
  2. Get code generated by GPT for pythonic analysis and plotting
  3. Improved logging - logging to csv files
  4. Improved handling of input params - using data validation with pydantic
  5. Using deterministic uuids
  6. Introduced some inheritance in the classes to make the flow easier
  7. Use RAG for pythonic analysis and plotting

How to test the code in the PR?

  1. Clone this branch into your local lyzr repo.
cd lyzr
git checkout -b imp/data-analyzr
git pull --set-upstream origin imp/data-analyzr
  1. Use the following python code to test any query:
import os
os.chdir("/path/to/lyzr/repo")

from lyzr.base import LyzrLLMFactory
from lyzr.data_analyzr import DataAnalyzr

da = DataAnalyzr(
    # required params
    analysis_type="ml", # or "sql", or "skip"
    # optional params
    api_key="sk-", # pass here or set in .env
    class_params={
        "max_retries": 5, # number of tries for each query
        "time_limit": 45, # time limit in seconds for each query
        "auto_train": True, # whether to update the RAG database automatically
    },
    log_params={
        "log_filename": "path/to/logfile.csv", # path to the log file
        "print_log": True, # whether to print logs to console
        "log_level": "INFO", # log level
    },
    generator_llm=LyzrLLMFactory.from_defaults(model="gpt-4-1106-preview"),
    analysis_llm=LyzrLLMFactory.from_defaults(model="gpt-3.5-turbo"),
    context="",
)
da.get_data(
    # required params
    db_type="files", # or "redshift", "postgres", "sqlite",
    data_config={
        "datasets": {
            "name_of_dataset": "path/to/dataset",
            "name_of_dataset": "path/to/dataset",
            "name_of_dataset": "path/to/dataset",
        },
    },
    # optional params
    vector_store_config={
        "path": "path/to/vector_store",
        "remake_store": True,
    },
)
result = da.ask(
    # required params
    user_input="Your question here",
    # optional params
    outputs=["visualisation", "insights", "recommendations", "tasks"],
    plot_path="path/to/save/plots",
    recommendations_params={
        "from_insights": True, # whether to use insights in recommendations' generation
        "output_type": "text", # or "json"
        "json_format": dict, # format for json output
    },
    counts={
        "insights": 3, # number of insights to generate
        "recommendations": 3, # number of recommendations to generate
        "tasks": 5, # number of tasks to generate
    },
    context={
        "analysis": "context for analysis",
        "visualisation": "context for visualisation",
        "insights": "context for insights",
        "recommendations": "context for recommendations",
        "tasks": "context for tasks",
    },
    # kwargs
    rerun_analysis=True, # whether to rerun analysis for the same query
    time_limit=45, # time limit in seconds for the query
    max_retries=3, # number of tries for the query
    auto_train=True, # whether to update the RAG database automatically
)
# show generated plot
from PIL import Image
Image.open(result["visualisation"])
# print insights
print(result["insights"])
# print recommendations
print(result["recommendations"])
# print tasks
print(result["tasks"])

What type of PR is this?

  • [x] :gift: New feature (non-breaking change which adds functionality)
  • [x] :bug: Bug fix (non-breaking change which fixes an issue)
  • [x] :bomb: Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] :memo: Documentation update
  • [x] :art: Refactor or style update
  • [x] :fire: Performance improvements
  • [ ] :white_check_mark: Test
  • [ ] :robot: Build
  • [ ] :repeat: CI, review, release, devops, chore, etc.

Checklist:

  • [x] :sunglasses: My code follows the style guidelines of this project.
  • [x] :ballot_box_with_check: I have performed a self-review of my code.
  • [ ] :bookmark_tabs: I have commented my code, particularly in hard-to-understand areas.
  • [ ] :bookmark: I have made corresponding changes to the documentation.
  • [x] :warning: My changes generate no new warnings.
  • [ ] :monocle_face: I have added tests that prove my fix is effective or that my feature works.
  • [ ] :white_check_mark: New and existing unit tests pass locally with my changes.
  • [x] :link: Any dependent changes have been merged and published in downstream modules.

gargimaheshwari avatar May 06 '24 08:05 gargimaheshwari

Todo:

  1. Testing
  2. Build

gargimaheshwari avatar May 07 '24 07:05 gargimaheshwari

Testing done. Build TBD.

gargimaheshwari avatar May 14 '24 11:05 gargimaheshwari

Build done.

gargimaheshwari avatar May 23 '24 14:05 gargimaheshwari