evadb
evadb copied to clipboard
Cache does not work for functions when there are third party data sources
Search before asking
- [X] I have searched the EvaDB issues and found no similar bug report.
Bug
Query:
SELECT name, followers, LLMBatch("S", lang) FROM github_data.stargazers
JOIN LATERAL WebPageTextExtractor(name) AS web(text)
JOIN LATERAL LLMExtractor("P", text) AS golden(lang)
WHERE followers > 100
LIMIT 10;
Cache enabled:
- WebPageTextExtractor
Error message:
Traceback (most recent call last):
File "/home/zxu330/eva/playground3.py", line 160, in <module>
res = cursor.query(query).df()
File "/home/zxu330/eva/evadb/interfaces/relational/relation.py", line 110, in df
batch = self.execute()
File "/home/zxu330/eva/evadb/interfaces/relational/relation.py", line 120, in execute
result = execute_statement(self._evadb, self._query_node.copy())
File "/home/zxu330/eva/evadb/server/command_handler.py", line 46, in execute_statement
physical_plan = plan_generator.build(logical_plan)
File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 110, in build
plan = self.optimize(logical_plan)
File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 101, in optimize
self.execute_task_stack(optimizer_context.task_stack)
File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 48, in execute_task_stack
task.execute()
File "/home/zxu330/eva/evadb/optimizer/optimizer_tasks.py", line 240, in execute
for plan in after:
File "/home/zxu330/eva/evadb/optimizer/rules/rules.py", line 279, in apply
new_func_expr = enable_cache(context, before.func_expr)
File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 293, in enable_cache
cache = enable_cache_init(context, func_expr)
File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 262, in enable_cache_init
optimized_key = optimize_cache_key(context, func_expr)
File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 254, in optimize_cache_key
optimized_keys += optimize_key_mapping_f[type(key)](context, key)
File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 205, in optimize_cache_key_for_tuple_value_expression
for col in get_table_primary_columns(table_obj):
File "/home/zxu330/eva/evadb/catalog/catalog_utils.py", line 172, in get_table_primary_columns
if table_catalog_obj.table_type == TableType.VIDEO_DATA:
AttributeError: 'NoneType' object has no attribute 'table_type'
Environment
- EvaDB: staging
- OS: ada-01
- Python: 3.10
- GPU: NO
- Ray: NO
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!
@xzdandy Can you provide the full code snippet for this including your db setup?
Hi @bygo7, please check https://github.com/georgia-tech-db/evadb/blob/xzdandy/playground3.py. You need to input your own GitHub token.
You need to enable cache for WebPageTextExtractor
by adding that to https://github.com/georgia-tech-db/evadb/blob/staging/evadb/constants.py#L20
Hi @xzdandy, can you please provide some detail on what this cache is supposed to do?
Hi @bygo7, this is an exact caching. So when you provide the same input, it will skip the function evaluation and directly return the results. In this case, when we run the query the second time, it should skip the evaluation of WebPageTextExtractor.