hyperspace icon indicating copy to clipboard operation
hyperspace copied to clipboard

[FEATURE REQUEST] Transpose the output for hyperspace.index API to make reading easier.

Open imback82 opened this issue 4 years ago • 3 comments

> +------+--------------+---------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-------------+----------+-------------+--------------+--------------+---------------+----------------+-----------------+---------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |name  |indexedColumns|includedColumns|numBuckets|schema                                                                                                                                                                                                                         |state |kind         |hasLineage|numIndexFiles|sizeIndexFiles|numSourceFiles|sizeSourceFiles|numAppendedFiles|sizeAppendedFiles|numDeletedFiles|sizeDeletedFiles|indexRootPaths                                                                                                                                                                                                                         |
> +------+--------------+---------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-------------+----------+-------------+--------------+--------------+---------------+----------------+-----------------+---------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |index1|[RGUID]       |[Date]         |200       |{"type":"struct","fields":[{"name":"RGUID","type":"string","nullable":true,"metadata":{}},{"name":"Date","type":"string","nullable":true,"metadata":{}},{"name":"_data_file_id","type":"long","nullable":false,"metadata":{}}]}|ACTIVE|CoveringIndex|true      |16           |1184          |10            |9244           |0               |0                |0              |0               |[file:/C://hyperspace/src/test/resources/indexStatsTest/index1/v__=0, file:/C://hyperspace/src/test/resources/indexStatsTest/index1/v__=1, file:/C://hyperspace/src/test/resources/indexStatsTest/index1/v__=2]|
> +------+--------------+---------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-------------+----------+-------------+--------------+--------------+---------------+----------------+-----------------+---------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Maybe we can transpose it so that it's easier to read? For example:

scala> sql("describe extended t").show(false)
+----------------------------+----------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                   |comment|
+----------------------------+----------------------------------------------------------------------------+-------+
|val                         |string                                                                      |null   |
|id                          |bigint                                                                      |null   |
|# Partition Information     |                                                                            |       |
|# col_name                  |data_type                                                                   |comment|
|id                          |bigint                                                                      |null   |
|                            |                                                                            |       |
|# Detailed Table Information|                                                                            |       |
|Database                    |test                                                                        |       |
|Table                       |t                                                                           |       |
|Owner                       |terryk                                                                      |       |
|Created Time                |Mon Dec 07 22:28:57 PST 2020                                                |       |
|Last Access                 |UNKNOWN                                                                     |       |
|Created By                  |Spark 3.0.1                                                                 |       |
|Type                        |MANAGED                                                                     |       |
|Provider                    |csv                                                                         |       |
|Table Properties            |[key1=val1]                                                                 |       |
|Location                    |file:/Users/terryk/spark/spark-3.0.1-bin-hadoop2.7/spark-warehouse/test.db/t|       |
|Serde Library               |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                          |       |
|InputFormat                 |org.apache.hadoop.mapred.SequenceFileInputFormat                            |       |
|OutputFormat                |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                   |       |
+----------------------------+----------------------------------------------------------------------------+-------+

Originally posted by @imback82 in https://github.com/microsoft/hyperspace/issues/286#issuecomment-742215883

imback82 avatar Dec 10 '20 03:12 imback82

I'm not sure because getIndexStat doesn't exist in the latest code base, but I think it's best to leave it to the user. If the user has 100 indexes, transposing won't help anyway. As the output is a DataFrame, the user should be able to do whatever suits him/her.

clee704 avatar Apr 06 '21 03:04 clee704

@clee704 The API name was changed during the code review :) It's statistic API for an index.

sezruby avatar Apr 07 '21 06:04 sezruby

@clee704 The API name was changed during the code review :) It's statistic API for an index.

So now it's Hyperspace.index? Because it returns a DataFrame, I still think it's better and more flexible to leave transposing or any data manipulating steps to the user.

clee704 avatar Apr 07 '21 15:04 clee704