hivemall
hivemall copied to clipboard
Support a DENSE array/matrix format as explanatory variables
Dense2Sparse
- Add dense2sparse() for converting dense matrix to sparse array
- Add option flag to remove zero valued element or not
dense2sparse(array(array[1.0,2.0], array[0.0,4.0]))
=> array["1_1:1.0","1_2":2.0","2_2:4.0"]
dense2sparse(array[1.0,2.0,0.0,4.0])
=> array["1:1.0","2":2.0","4:4.0"]
nested array/matrix representation in Hive
drop table arraytest;
create table arraytest
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
as
SELECT array(array(1.0, 2.0, 3.0), array(0.0, 5.0))
FROM myui.dual;
select * from arraytest;
[[1.0,2.0,3.0],[0.0,5.0]]
hadoop fs -cat /user/hive/warehouse/news20.db/arraytest/000000_0
1.0,2.0,3.0|0.0,5.0
For an array of arrays the delimiters for the outer array are "|" characters as expected, but for the inner array they are "," characters (map keys!!).