evadb icon indicating copy to clipboard operation
evadb copied to clipboard

Cache does not work for functions when there are third party data sources

Open xzdandy opened this issue 1 year ago • 4 comments

Search before asking

  • [X] I have searched the EvaDB issues and found no similar bug report.

Bug

Query:

SELECT name, followers, LLMBatch("S", lang) FROM github_data.stargazers
JOIN LATERAL WebPageTextExtractor(name) AS web(text)
JOIN LATERAL LLMExtractor("P", text) AS golden(lang)
WHERE followers > 100
LIMIT 10;

Cache enabled:

  • WebPageTextExtractor

Error message:

Traceback (most recent call last):
  File "/home/zxu330/eva/playground3.py", line 160, in <module>
    res = cursor.query(query).df()
  File "/home/zxu330/eva/evadb/interfaces/relational/relation.py", line 110, in df
    batch = self.execute()
  File "/home/zxu330/eva/evadb/interfaces/relational/relation.py", line 120, in execute
    result = execute_statement(self._evadb, self._query_node.copy())
  File "/home/zxu330/eva/evadb/server/command_handler.py", line 46, in execute_statement
    physical_plan = plan_generator.build(logical_plan)
  File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 110, in build
    plan = self.optimize(logical_plan)
  File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 101, in optimize
    self.execute_task_stack(optimizer_context.task_stack)
  File "/home/zxu330/eva/evadb/optimizer/plan_generator.py", line 48, in execute_task_stack
    task.execute()
  File "/home/zxu330/eva/evadb/optimizer/optimizer_tasks.py", line 240, in execute
    for plan in after:
  File "/home/zxu330/eva/evadb/optimizer/rules/rules.py", line 279, in apply
    new_func_expr = enable_cache(context, before.func_expr)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 293, in enable_cache
    cache = enable_cache_init(context, func_expr)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 262, in enable_cache_init
    optimized_key = optimize_cache_key(context, func_expr)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 254, in optimize_cache_key
    optimized_keys += optimize_key_mapping_f[type(key)](context, key)
  File "/home/zxu330/eva/evadb/optimizer/optimizer_utils.py", line 205, in optimize_cache_key_for_tuple_value_expression
    for col in get_table_primary_columns(table_obj):
  File "/home/zxu330/eva/evadb/catalog/catalog_utils.py", line 172, in get_table_primary_columns
    if table_catalog_obj.table_type == TableType.VIDEO_DATA:
AttributeError: 'NoneType' object has no attribute 'table_type'

Environment

  • EvaDB: staging
  • OS: ada-01
  • Python: 3.10
  • GPU: NO
  • Ray: NO

Are you willing to submit a PR?

  • [ ] Yes I'd like to help by submitting a PR!

xzdandy avatar Oct 05 '23 06:10 xzdandy

@xzdandy Can you provide the full code snippet for this including your db setup?

bygo7 avatar Oct 26 '23 03:10 bygo7

Hi @bygo7, please check https://github.com/georgia-tech-db/evadb/blob/xzdandy/playground3.py. You need to input your own GitHub token.

You need to enable cache for WebPageTextExtractor by adding that to https://github.com/georgia-tech-db/evadb/blob/staging/evadb/constants.py#L20

xzdandy avatar Oct 27 '23 07:10 xzdandy

Hi @xzdandy, can you please provide some detail on what this cache is supposed to do?

bygo7 avatar Nov 07 '23 18:11 bygo7

Hi @bygo7, this is an exact caching. So when you provide the same input, it will skip the function evaluation and directly return the results. In this case, when we run the query the second time, it should skip the evaluation of WebPageTextExtractor.

xzdandy avatar Nov 08 '23 23:11 xzdandy