lyzr
lyzr copied to clipboard
DataAnalyzr improvements - big PR
Updates and changes
- Refactoring code in the module
- Get code generated by GPT for pythonic analysis and plotting
- Improved logging - logging to csv files
- Improved handling of input params - using data validation with pydantic
- Using deterministic uuids
- Introduced some inheritance in the classes to make the flow easier
- Use RAG for pythonic analysis and plotting
How to test the code in the PR?
- Clone this branch into your local lyzr repo.
cd lyzr
git checkout -b imp/data-analyzr
git pull --set-upstream origin imp/data-analyzr
- Use the following python code to test any query:
import os
os.chdir("/path/to/lyzr/repo")
from lyzr.base import LyzrLLMFactory
from lyzr.data_analyzr import DataAnalyzr
da = DataAnalyzr(
# required params
analysis_type="ml", # or "sql", or "skip"
# optional params
api_key="sk-", # pass here or set in .env
class_params={
"max_retries": 5, # number of tries for each query
"time_limit": 45, # time limit in seconds for each query
"auto_train": True, # whether to update the RAG database automatically
},
log_params={
"log_filename": "path/to/logfile.csv", # path to the log file
"print_log": True, # whether to print logs to console
"log_level": "INFO", # log level
},
generator_llm=LyzrLLMFactory.from_defaults(model="gpt-4-1106-preview"),
analysis_llm=LyzrLLMFactory.from_defaults(model="gpt-3.5-turbo"),
context="",
)
da.get_data(
# required params
db_type="files", # or "redshift", "postgres", "sqlite",
data_config={
"datasets": {
"name_of_dataset": "path/to/dataset",
"name_of_dataset": "path/to/dataset",
"name_of_dataset": "path/to/dataset",
},
},
# optional params
vector_store_config={
"path": "path/to/vector_store",
"remake_store": True,
},
)
result = da.ask(
# required params
user_input="Your question here",
# optional params
outputs=["visualisation", "insights", "recommendations", "tasks"],
plot_path="path/to/save/plots",
recommendations_params={
"from_insights": True, # whether to use insights in recommendations' generation
"output_type": "text", # or "json"
"json_format": dict, # format for json output
},
counts={
"insights": 3, # number of insights to generate
"recommendations": 3, # number of recommendations to generate
"tasks": 5, # number of tasks to generate
},
context={
"analysis": "context for analysis",
"visualisation": "context for visualisation",
"insights": "context for insights",
"recommendations": "context for recommendations",
"tasks": "context for tasks",
},
# kwargs
rerun_analysis=True, # whether to rerun analysis for the same query
time_limit=45, # time limit in seconds for the query
max_retries=3, # number of tries for the query
auto_train=True, # whether to update the RAG database automatically
)
# show generated plot
from PIL import Image
Image.open(result["visualisation"])
# print insights
print(result["insights"])
# print recommendations
print(result["recommendations"])
# print tasks
print(result["tasks"])
What type of PR is this?
- [x] :gift: New feature (non-breaking change which adds functionality)
- [x] :bug: Bug fix (non-breaking change which fixes an issue)
- [x] :bomb: Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] :memo: Documentation update
- [x] :art: Refactor or style update
- [x] :fire: Performance improvements
- [ ] :white_check_mark: Test
- [ ] :robot: Build
- [ ] :repeat: CI, review, release, devops, chore, etc.
Checklist:
- [x] :sunglasses: My code follows the style guidelines of this project.
- [x] :ballot_box_with_check: I have performed a self-review of my code.
- [ ] :bookmark_tabs: I have commented my code, particularly in hard-to-understand areas.
- [ ] :bookmark: I have made corresponding changes to the documentation.
- [x] :warning: My changes generate no new warnings.
- [ ] :monocle_face: I have added tests that prove my fix is effective or that my feature works.
- [ ] :white_check_mark: New and existing unit tests pass locally with my changes.
- [x] :link: Any dependent changes have been merged and published in downstream modules.
Todo:
- Testing
- Build
Testing done. Build TBD.
Build done.