hyperspace
hyperspace copied to clipboard
[FEATURE REQUEST] Transpose the output for hyperspace.index API to make reading easier.
> +------+--------------+---------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-------------+----------+-------------+--------------+--------------+---------------+----------------+-----------------+---------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |name |indexedColumns|includedColumns|numBuckets|schema |state |kind |hasLineage|numIndexFiles|sizeIndexFiles|numSourceFiles|sizeSourceFiles|numAppendedFiles|sizeAppendedFiles|numDeletedFiles|sizeDeletedFiles|indexRootPaths |
> +------+--------------+---------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-------------+----------+-------------+--------------+--------------+---------------+----------------+-----------------+---------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |index1|[RGUID] |[Date] |200 |{"type":"struct","fields":[{"name":"RGUID","type":"string","nullable":true,"metadata":{}},{"name":"Date","type":"string","nullable":true,"metadata":{}},{"name":"_data_file_id","type":"long","nullable":false,"metadata":{}}]}|ACTIVE|CoveringIndex|true |16 |1184 |10 |9244 |0 |0 |0 |0 |[file:/C://hyperspace/src/test/resources/indexStatsTest/index1/v__=0, file:/C://hyperspace/src/test/resources/indexStatsTest/index1/v__=1, file:/C://hyperspace/src/test/resources/indexStatsTest/index1/v__=2]|
> +------+--------------+---------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-------------+----------+-------------+--------------+--------------+---------------+----------------+-----------------+---------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Maybe we can transpose it so that it's easier to read? For example:
scala> sql("describe extended t").show(false)
+----------------------------+----------------------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+----------------------------------------------------------------------------+-------+
|val |string |null |
|id |bigint |null |
|# Partition Information | | |
|# col_name |data_type |comment|
|id |bigint |null |
| | | |
|# Detailed Table Information| | |
|Database |test | |
|Table |t | |
|Owner |terryk | |
|Created Time |Mon Dec 07 22:28:57 PST 2020 | |
|Last Access |UNKNOWN | |
|Created By |Spark 3.0.1 | |
|Type |MANAGED | |
|Provider |csv | |
|Table Properties |[key1=val1] | |
|Location |file:/Users/terryk/spark/spark-3.0.1-bin-hadoop2.7/spark-warehouse/test.db/t| |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |
+----------------------------+----------------------------------------------------------------------------+-------+
Originally posted by @imback82 in https://github.com/microsoft/hyperspace/issues/286#issuecomment-742215883
I'm not sure because getIndexStat doesn't exist in the latest code base, but I think it's best to leave it to the user. If the user has 100 indexes, transposing won't help anyway. As the output is a DataFrame, the user should be able to do whatever suits him/her.
@clee704 The API name was changed during the code review :) It's statistic API for an index.
@clee704 The API name was changed during the code review :) It's statistic API for an index.
So now it's Hyperspace.index? Because it returns a DataFrame, I still think it's better and more flexible to leave transposing or any data manipulating steps to the user.