delta Z order null to 0

This is more of a discussion as opposed to an issue.

If we interleave bits, 0 will be always be at the top (so to speak). In your code, you convert nulls to 0.

Why push nulls to the top?
Why not val nullValue: Int = math.pow(2, 31) - 1?

Take for example telco data. We will have server hostname and domain name. These columns are used a lot in query where statements. However, the server hostname (where I used to work), would contain a lot of nulls. In a day, we have 10 billion plus records. Stacking all the nulls in the front for interleave wouldn't be ideal (or help with query performance since we are usually looking for non-null entries for some sort of analysis). Instead we pushed them to back for Z ordering (on premise cluster and this wasn't open sourced until a few days ago).

Is there any reason to convert nulls to 0 during Z ordering or was this just a quick idea to overcome null values?

Jul 03 '22 09:07 dwsmith1983

Hi @dwsmith1983, thanks for the question. There is no particular reason why we map null to 0. Currently the Z-Order used Spark's RangePartitioner to map the Z-Order column to a range id (code is here). The mapping of null to 0 is inherited from the Spark's range partitioner.

For the performance, AFAIK it doesn't matter if we put nulls first or last. Z-Order rewrites the data by ordering the data using Z-order value and splitting the data into multiple files. The column stats from the rewritten file are used in skipping reading the file. So it doesn't matter if the nulls are at the end of Z-order curve or beginning. Let me know if I am missing anything here.

Jul 13 '22 19:07 vkorukanti

@vkorukanti sorry for the slow response. According to DB blog Processing Petabytes of data, data skipping isn't 100% efficient and declines over each column in the z ordering. Of course, the fall isn't as bad as linear ordering. Given this, there would still be inefficients in stacking nulls in the beginning when encountered. If they were all at the back, when searching, we would experience the efficiency loss still but wouldn't encounter an excess nulls to parse through if they existed in that block of data.

Jul 21 '22 14:07 dwsmith1983

Hi @dwsmith1983 - following up on this.

It would be helpful to make this discussion explicit. Can you share a small example of today's behavior (data, z-order command, resulting file layout/distribution) and how an alternate NULL treatment would change the behavior/perf? There's a target size conf for OPTIMIZE ZORDER that can help us work through the problem on a much smaller set of data:

import org.apache.spark.sql.delta.sources.DeltaSQLConf
spark.conf.set(DeltaSQLConf.DELTA_OPTIMIZE_MAX_FILE_SIZE.key, 1024*10)

This setting above, for example, would make the max file size ~1kb.

Aug 23 '22 23:08 nkarpov

@nkarpov do you have a link to so some sample data to use?

Aug 24 '22 09:08 dwsmith1983

Is it possible to generate a candidate set that has the NULL distribution you want to address?

The example below will be 50% NULLs in 1 file before the zorder command, and 5 active files (with null distribution visible in the addfile stats) afterward.

val data = (0 to 9999 by 2).toSeq.map(x=>(x, x))
val nulls = (1 to 9999 by 2).toSeq.map((_, null))
val df = data.toDF("id","data").union(nulls.toDF("id","data"))
df.repartition(1).write.format("delta").saveAsTable("test")

spark.conf.set(DeltaSQLConf.DELTA_OPTIMIZE_MAX_FILE_SIZE.key, 1024*10)

DeltaTable.forName(spark, "test").optimize().executeZOrderBy("id","data")

{"add":{"path":"part-00000-4767d446-3090-403e-97c1-47e0de33bc53-c000.snappy.parquet","partitionValues":{},"size":11337,"modificationTime":1661364725114,"dataChange":false,"stats":"{\"numRecords\":1969,\"minValues\":{\"id\":0,\"data\":0},\"maxValues\":{\"id\":2559,\"data\":1378},\"nullCount\":{\"id\":0,\"data\":1280}}"}}
{"add":{"path":"part-00001-56016bdb-f27b-4990-b949-a6fccbe5630a-c000.snappy.parquet","partitionValues":{},"size":12388,"modificationTime":1661364725161,"dataChange":false,"stats":"{\"numRecords\":2099,\"minValues\":{\"id\":1370,\"data\":1370},\"maxValues\":{\"id\":5119,\"data\":3018},\"nullCount\":{\"id\":0,\"data\":1280}}"}}
{"add":{"path":"part-00002-81040826-957f-462e-82f6-5075a94f4ebc-c000.snappy.parquet","partitionValues":{},"size":11788,"modificationTime":1661364725226,"dataChange":false,"stats":"{\"numRecords\":1718,\"minValues\":{\"id\":3010,\"data\":3010},\"maxValues\":{\"id\":6455,\"data\":5118},\"nullCount\":{\"id\":0,\"data\":666}}"}}
{"add":{"path":"part-00003-d19d297d-22db-4232-8d8d-239a41563052-c000.snappy.parquet","partitionValues":{},"size":10996,"modificationTime":1661364725272,"dataChange":false,"stats":"{\"numRecords\":2172,\"minValues\":{\"id\":5120,\"data\":5120},\"maxValues\":{\"id\":9999,\"data\":5916},\"nullCount\":{\"id\":0,\"data\":1774}}"}}
{"add":{"path":"part-00004-e0b0bfb2-d6eb-432d-9b20-896f75c9f791-c000.snappy.parquet","partitionValues":{},"size":17048,"modificationTime":1661364725308,"dataChange":false,"stats":"{\"numRecords\":2042,\"minValues\":{\"id\":5910,\"data\":5910},\"maxValues\":{\"id\":9998,\"data\":9998},\"nullCount\":{\"id\":0,\"data\":0}}"}}
{"remove":{"path":"part-00000-2f6df9ec-6997-4095-841b-1a383f25003f-c000.snappy.parquet","deletionTimestamp":1661364724009,"dataChange":false,"extendedFileMetadata":true,"partitionValues":{},"size":60729}}
{"commitInfo":{"timestamp":1661364725317,"operation":"OPTIMIZE","operationParameters":{"predicate":"[]","zOrderBy":"[\"id\",\"data\"]"},"readVersion":0,"isolationLevel":"SnapshotIsolation","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"1","numRemovedBytes":"60729","p25FileSize":"11337","minFileSize":"10996","numAddedFiles":"5","maxFileSize":"17048","p75FileSize":"12388","p50FileSize":"11788","numAddedBytes":"63557"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/2.0.0","txnId":"f5f62ac5-e308-4ee9-ac14-3c68a5414924"}}

Aug 24 '22 18:08 nkarpov

delta delta copied to clipboard

Z order null to 0

delta
delta copied to clipboard