hivemall icon indicating copy to clipboard operation
hivemall copied to clipboard

Support a DENSE array/matrix format as explanatory variables

Open myui opened this issue 11 years ago • 0 comments

Dense2Sparse

  • Add dense2sparse() for converting dense matrix to sparse array
    • Add option flag to remove zero valued element or not
dense2sparse(array(array[1.0,2.0], array[0.0,4.0])) 
=> array["1_1:1.0","1_2":2.0","2_2:4.0"]

dense2sparse(array[1.0,2.0,0.0,4.0]) 
=> array["1:1.0","2":2.0","4:4.0"]

nested array/matrix representation in Hive

drop table arraytest;
create table arraytest
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  COLLECTION ITEMS TERMINATED BY '|'
  MAP KEYS TERMINATED BY ','
  LINES TERMINATED BY '\n'
STORED AS TEXTFILE
as
SELECT array(array(1.0, 2.0, 3.0), array(0.0, 5.0))
FROM myui.dual;

select * from arraytest;
[[1.0,2.0,3.0],[0.0,5.0]]

hadoop fs -cat /user/hive/warehouse/news20.db/arraytest/000000_0

1.0,2.0,3.0|0.0,5.0

For an array of arrays the delimiters for the outer array are "|" characters as expected, but for the inner array they are "," characters (map keys!!).

myui avatar Oct 11 '13 04:10 myui