xorbits icon indicating copy to clipboard operation
xorbits copied to clipboard

BUG: tpch sf10 get chunk shape error

Open luweizheng opened this issue 1 year ago • 2 comments

Describe the bug

I am running the tpch sf10 benchmark. I get the following error. sf1 can work in my environment.

Traceback (most recent call last):
  File "~/hpc/test_xorbits/df-bench/tpch/xorbits/xorbits_query.py", line 1250, in run_queries
    result = result.execute()
  File "~/hpc/xorbits/python/xorbits/core/adapter.py", line 310, in _wrapped
    return member_func(mars_entity, *args, **kwargs)
  File "~/hpc/xorbits/python/xorbits/core/adapter.py", line 444, in wrapped
    return from_mars(c(*to_mars(args), **to_mars(kwargs)))
  File "~/hpc/xorbits/python/xorbits/_mars/core/entity/executable.py", line 152, in execute
    return execute(self, session=session, **kw)
  File "~/hpc/xorbits/python/xorbits/_mars/deploy/oscar/session.py", line 1709, in execute
    return session.execute(
  File "~/hpc/xorbits/python/xorbits/_mars/deploy/oscar/session.py", line 1526, in execute
    execution_info: ExecutionInfo = fut.result(
  File "/fs/fast/u20200002/envs/ucx/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/fs/fast/u20200002/envs/ucx/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "~/hpc/xorbits/python/xorbits/_mars/deploy/oscar/session.py", line 1689, in _execute
    await execution_info
  File "~/hpc/xorbits/python/xorbits/_mars/deploy/oscar/session.py", line 102, in wait
    return await self._aio_task
  File "~/hpc/xorbits/python/xorbits/_mars/deploy/oscar/session.py", line 848, in _run_in_background
    raise task_result.error.with_traceback(task_result.traceback)
  File "~/hpc/xorbits/python/xorbits/_mars/services/task/supervisor/processor.py", line 387, in run
    async for stage_args in self._iter_stage_chunk_graph():
  File "~/hpc/xorbits/python/xorbits/_mars/services/task/supervisor/processor.py", line 171, in _iter_stage_chunk_graph
    chunk_graph = await self._get_next_chunk_graph(chunk_graph_iter)
  File "~/hpc/xorbits/python/xorbits/_mars/services/task/supervisor/processor.py", line 162, in _get_next_chunk_graph
    chunk_graph = await fut
  File "/fs/fast/u20200002/envs/ucx/lib/python3.9/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/fs/fast/u20200002/envs/ucx/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "~/hpc/xorbits/python/xorbits/_mars/services/task/supervisor/processor.py", line 157, in next_chunk_graph
    return next(chunk_graph_iter)
  File "~/hpc/xorbits/python/xorbits/_mars/services/task/supervisor/preprocessor.py", line 201, in tile
    for chunk_graph in chunk_graph_builder.build():
  File "~/hpc/xorbits/python/xorbits/_mars/core/graph/builder/chunk.py", line 431, in build
    yield from self._build()
  File "~/hpc/xorbits/python/xorbits/_mars/core/graph/builder/chunk.py", line 425, in _build
    graph = next(tile_iterator)
  File "~/hpc/xorbits/python/xorbits/_mars/services/task/supervisor/preprocessor.py", line 75, in _iter_without_check
    to_update_tileables = self._iter()
  File "~/hpc/xorbits/python/xorbits/_mars/core/graph/builder/chunk.py", line 308, in _iter
    self._tile(
  File "~/hpc/xorbits/python/xorbits/_mars/core/graph/builder/chunk.py", line 201, in _tile
    need_process = next(tile_handler)
  File "~/hpc/xorbits/python/xorbits/_mars/core/graph/builder/chunk.py", line 173, in _tile_handler
    tiled_tileables = yield from handler.tile(tiled_tileables)
  File "~/hpc/xorbits/python/xorbits/_mars/core/entity/tileables.py", line 80, in tile
    tiled_result = yield from tile_handler(op)
  File "~/hpc/xorbits/python/xorbits/_mars/dataframe/indexing/getitem.py", line 358, in tile
    return (yield from cls.tile_with_mask(op))
  File "~/hpc/xorbits/python/xorbits/_mars/dataframe/indexing/getitem.py", line 377, in tile_with_mask
    mask.rechunk(in_df.nsplits[: mask.ndim])
  File "~/hpc/xorbits/python/xorbits/_mars/dataframe/base/rechunk.py", line 199, in rechunk
    chunk_size = _get_chunk_size(a, chunk_size)
  File "~/hpc/xorbits/python/xorbits/_mars/dataframe/base/rechunk.py", line 191, in _get_chunk_size
    return get_nsplits(a, chunk_size, itemsize)
  File "~/hpc/xorbits/python/xorbits/_mars/tensor/rechunk/core.py", line 38, in get_nsplits
    return decide_chunk_sizes(tileable.shape, chunk_size, itemsize)
  File "~/hpc/xorbits/python/xorbits/_mars/tensor/utils.py", line 618, in decide_chunk_sizes
    return normalize_chunk_sizes(
  File "~/hpc/xorbits/python/xorbits/_mars/tensor/utils.py", line 73, in normalize_chunk_sizes
    raise ValueError(
ValueError: chunks shape should be of the same length, got shape: 2805000, chunks: (195000, 195000, 195000, 195000, 180000, 180000, 180000, 180000)

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Python 3.9
  2. main branch on github

luweizheng avatar Jun 09 '23 08:06 luweizheng

Could you provide which query failed?

aresnow1 avatar Jun 09 '23 08:06 aresnow1

Could you provide which query failed?

Query 22: https://github.com/dbiir/df-bench/blob/main/tpch/xorbits/xorbits_query.py

Most of the code is similar to the benchmark folder in xorbits.

You can run this query with:

python -u xorbits_query.py --path ../SF1 \
    --log_timing \
    --queries 22 \
    --endpoint "$address" \
    --print_result

luweizheng avatar Jun 09 '23 09:06 luweizheng