modin Map modin category type to arrow dictionary type in omnisci backend

As omnisci finally supports arrow 2.0 we could explicitly map modin's category type to arrow's dictionary type. This would allow us to distinguish those types in FSI.

Feb 17 '21 08:02 fexolm

https://github.com/intel-ai/omniscidb/tree/modin_test could be used as a source for omnisci backed wtih Arrow 2.0 support

Feb 26 '21 12:02 Garra1980

I've implemented none encoded string support in the ArrowResultSetConverter. So, it shouldn't be a blocker anymore.

Mar 22 '21 10:03 fexolm

Current intel-ai/omniscidb/modin_cats branch still has a problems with none-encoding strings. When such type of string occurs in the result set, the following exception may be thrown:

Exception: Columnar conversion not supported for variable length types

It's thrown from here and can be reproduced with this code:

import pyarrow as pa
import sys

prev = sys.getdlopenflags()
sys.setdlopenflags(1 | 256)  # RTLD_LAZY+RTLD_GLOBAL

from dbe import PyDbEngine

at = pa.Table.from_pydict(
    {"col1": ["str12", "str2", "str3"], "col2": [1, 2, 3]},
    schema=pa.schema({"col1": pa.string(), "col2": pa.int32()}),
)

server = PyDbEngine()
server.importArrowTable("test_name", at)

print(server.select_df("SELECT * FROM test_name ORDER BY col2"))    # OK
print(server.select_df("SELECT col1 FROM test_name ORDER BY col2")) # RuntimeError

The same queries in relation algebra notation:

Calcite RA queries

query 1

query_src: SELECT * FROM test_name ORDER BY col2
query_ra: {
  "rels": [
    {
      "id": "0",
      "relOp": "LogicalTableScan",
      "fieldNames": [
        "col1",
        "col2",
        "rowid"
      ],
      "table": [
        "omnisci",
        "test_name"
      ],
      "inputs": []
    },
    {
      "id": "1",
      "relOp": "LogicalProject",
      "fields": [
        "col1",
        "col2"
      ],
      "exprs": [
        {
          "input": 0
        },
        {
          "input": 1
        }
      ]
    },
    {
      "id": "2",
      "relOp": "LogicalSort",
      "collation": [
        {
          "field": 1,
          "direction": "ASCENDING",
          "nulls": "LAST"
        }
      ]
    }
  ]
}

query 2

query_src: SELECT col1 FROM test_name ORDER BY col2
query_ra: {
  "rels": [
    {
      "id": "0",
      "relOp": "LogicalTableScan",
      "fieldNames": [
        "col1",
        "col2",
        "rowid"
      ],
      "table": [
        "omnisci",
        "test_name"
      ],
      "inputs": []
    },
    {
      "id": "1",
      "relOp": "LogicalProject",
      "fields": [
        "col1",
        "col2"
      ],
      "exprs": [
        {
          "input": 0
        },
        {
          "input": 1
        }
      ]
    },
    {
      "id": "2",
      "relOp": "LogicalSort",
      "collation": [
        {
          "field": 1,
          "direction": "ASCENDING",
          "nulls": "LAST"
        }
      ]
    },
    {
      "id": "3",
      "relOp": "LogicalProject",
      "fields": [
        "col1"
      ],
      "exprs": [
        {
          "input": 0
        }
      ]
    }
  ]
}

As we can see from the calcite json, the thing that triggers exception is the projection of sort result. Other cases when that exception may be thrown are not investigated yet.

Mar 25 '21 13:03 dchigarev

So far it looks like problem is in Omnisci side, they are looking into it

Apr 01 '21 11:04 Garra1980

So far it looks like problem is in Omnisci side, they are looking into it

Any updates here?

Oct 29 '23 17:10 anmyachev

HDK engine is deprecated and will be removed in a future version.

May 16 '24 11:05 YarShev

modin modin copied to clipboard

Map modin category type to arrow dictionary type in omnisci backend

modin
modin copied to clipboard