evadb
evadb copied to clipboard
Unstructured data to structured data conversion via `EXTRACT_COLUMN`
Added custom function for extracting columns from unstructured data new file: ../evadb/functions/extract_columns.py
@xzdandy I created a python notebook as well but it gets gitignored while the rest of the tutorial notebooks don't any idea?
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Solves #1235
@xzdandy @pchunduri6 moved to a "one-column-at-a-time" implementation as you recommended.
The notebook has the implementation
For one column at a time I think this PR is ready for review @xzdandy @pchunduri6.
For the other changes discussed with either of you, I think it makes sense to take that up in a separate PR else this will bloat. Let me know what you think
Can we also add a long integration test for the function under https://github.com/georgia-tech-db/evadb/tree/staging/test/integration_tests/long/functions? We can skip the test in circle ci due to openai key, but I think it is good to have one.
It can either be end-to-end (i.e., SQL queries) or directly test the function class.
Yes @xzdandy on it
Also this is failing the linter check for a Colab Notebook. Can you point me towards information on how to add that
Also this is failing the linter check for a Colab Notebook. Can you point me towards information on how to add that
Remove the last empty cell.
12-01-2023 17:31:12 [check_notebook_format:295] ERROR: ERROR: Notebook /Users/hershdhillon23/projects/evadb/script/formatting/../../tutorials/20-structured-data.ipynb does not contain correct Colab link -- update the link.
Do not have a collar link right now
12-01-2023 17:31:12 [check_notebook_format:295] ERROR: ERROR: Notebook /Users/hershdhillon23/projects/evadb/script/formatting/../../tutorials/20-structured-data.ipynb does not contain correct Colab link -- update the link.
Do not have a collar link right now
The current notebook actually does not work on the colab. I was trying to make it work yesterday and I think it needs several modifications. One fix can help is that can you add the EXTRACT_COLUMN
to bootstrap functions in https://github.com/georgia-tech-db/evadb/blob/staging/evadb/functions/function_bootstrap_queries.py
Should we perform this operation using ChatGPT directly or use something like pandasAI to write a function using LLM and then extract the column we need? Writing a function is much cheaper token cost-wise, but less robust. @hershd23 @xzdandy Any thoughts?
Should we perform this operation using ChatGPT directly or use something like pandasAI to write a function using LLM and then extract the column we need? Writing a function is much cheaper token cost-wise, but less robust. @hershd23 @xzdandy Any thoughts?
Hi @pchunduri6, I think it depends on the task. If the extract column is based on patterns, I think we can generate regex for saving the cost and improve efficiency. On the other hand, if the task is semantic based, we need to rely on the LLM to extract the information.