hive-funnel-udf
hive-funnel-udf copied to clipboard
same user had different result
os environment hive version: Hive 1.1.0-cdh5.5.2 cdh version: cdh 5.5.2
when i query one day like:
select ouid, funnel(tag1, game_time, array('activity'), array('yaozujianglin'), array('huangjinyuchang')) as funnel from sscq.odl_act_detail_info_sscq_qq where ds = '2016-12-17' group by ouid order by ouid asc
one ouid is : 000000000000000000000000001EC76C [0,0,0]
but i only query one id: select ouid, funnel(tag1, game_time, array('activity'), array('yaozujianglin'), array('huangjinyuchang')) as funnel from sscq.odl_act_detail_info_sscq_qq where ds = '2016-12-17' and ouid = '000000000000000000000000001EC76C' group by ouid order by ouid asc result is different. 000000000000000000000000001EC76C [1,0,0]
- You don't need the
order by ouid asc
part of the query, just thegroup by ouid
, assuming thatouid
is your unique ID. - I am not sure how it would report that first record but give an empty funnel count of
[0,0,0]
. Isgame_time
a timestamp column in a string/long format?
As I don't have access to your data, it may be best for you to show an example failure in a unit test. Here is a sample unit test to use: https://github.com/yahoo/hive-funnel-udf/blob/408b0a1e6a54f074e228e107b90eebd08b568c00/src/test/java/com/yahoo/hive/udf/funnel/FunnelTest.java#L130-L174
If you can construct a sample unit test that exhibits this behavior, I can then develop a fix for the issue.
As an alternative, if you could provide a Hive script that creates a sample database/table, and inserts some sample artificial records, I could then debug the problem.
game_time cloumn type is bigint, i guess it maybe serialize and deserialize data error, can get different intermediate result data.
I have used bigint
timestamp columns with these UDFs before and it works. I don't think that is the issue.
Would it be possible to create a simple Hive script to generate a table, add a few records, and verify that this issue persists? Once there is a replicable failure, we can fix the issue.
I have faced the same issue. Sorting the dataset in inner query before calling funnel gave right result. Seems like somewhere sorting is making difference. I am able to reproduce this bug but not able to extract smaller dataset for the case. Something not so direct.