connector-x icon indicating copy to clipboard operation
connector-x copied to clipboard

Implement partition_on for non numeric columns

Open troyyyang opened this issue 3 years ago • 6 comments

Describe your feature request

I'm not even sure if this is something that can be implemented, but it would be amazing if partition_on could be used for non-numeric columns!

troyyyang avatar Jun 22 '22 16:06 troyyyang

Thanks for the suggestion @troyyyang ! Currently, we don't have an internal default way to partition on non-numerical columns for now. Please feel free to share how you think it should be implemented.

In the meanwhile, if you know how to partition on your query, you can do the partition outside of connectorx, and pass in a list of partitioned queries like this example so you won't be restricted to the type of the columns right now:

import connectorx as cx

postgres_url = "postgresql://username:password@server:port/database"
queries = ["SELECT * FROM lineitem WHERE l_orderkey <= 30000000", "SELECT * FROM lineitem WHERE l_orderkey > 30000000"]

cx.read_sql(postgres_url, queries)

wangxiaoying avatar Jun 24 '22 00:06 wangxiaoying

@wangxiaoying theoretically, partitioning by timestamps could also be added (they support min/max). This will be useful for time series data.

valxv avatar Jul 13 '22 16:07 valxv

Hi @valxv , thank you for the great suggestion. We will add this feature to our future plan : ) https://github.com/sfu-db/connector-x/issues/313

wangxiaoying avatar Jul 13 '22 23:07 wangxiaoying

In spark I have gotten around partitioned reads of non-numeric columns by doing something like the following where I hash the non numeric column and use the modulus as the partition number.

SELECT
    ABS(hashtext(non_numeric_column0)) % 10 as partition,
    non_numeric_column,
    column1,
    column2,
FROM
       table

Would doing something similar work for connectorx as well?

theelderbeever avatar Nov 04 '22 18:11 theelderbeever

Hi @theelderbeever , I think it should work. You can set partition as the partition column and partition number to 10 in this example.

wangxiaoying avatar Nov 10 '22 20:11 wangxiaoying

When will this feature be launched?

zjh1234562 avatar May 22 '23 09:05 zjh1234562