hive-funnel-udf icon indicating copy to clipboard operation
hive-funnel-udf copied to clipboard

same user had different result

Open ft20082 opened this issue 8 years ago • 4 comments

os environment hive version: Hive 1.1.0-cdh5.5.2 cdh version: cdh 5.5.2

when i query one day like:

select ouid, funnel(tag1, game_time, array('activity'), array('yaozujianglin'), array('huangjinyuchang')) as funnel from sscq.odl_act_detail_info_sscq_qq where ds = '2016-12-17' group by ouid order by ouid asc

one ouid is : 000000000000000000000000001EC76C [0,0,0]

but i only query one id: select ouid, funnel(tag1, game_time, array('activity'), array('yaozujianglin'), array('huangjinyuchang')) as funnel from sscq.odl_act_detail_info_sscq_qq where ds = '2016-12-17' and ouid = '000000000000000000000000001EC76C' group by ouid order by ouid asc result is different. 000000000000000000000000001EC76C [1,0,0]

ft20082 avatar Dec 18 '16 11:12 ft20082

  1. You don't need the order by ouid asc part of the query, just the group by ouid, assuming that ouid is your unique ID.
  2. I am not sure how it would report that first record but give an empty funnel count of [0,0,0]. Is game_time a timestamp column in a string/long format?

As I don't have access to your data, it may be best for you to show an example failure in a unit test. Here is a sample unit test to use: https://github.com/yahoo/hive-funnel-udf/blob/408b0a1e6a54f074e228e107b90eebd08b568c00/src/test/java/com/yahoo/hive/udf/funnel/FunnelTest.java#L130-L174

If you can construct a sample unit test that exhibits this behavior, I can then develop a fix for the issue.

As an alternative, if you could provide a Hive script that creates a sample database/table, and inserts some sample artificial records, I could then debug the problem.

joshwalters avatar Jan 03 '17 18:01 joshwalters

game_time cloumn type is bigint, i guess it maybe serialize and deserialize data error, can get different intermediate result data.

ft20082 avatar Jan 05 '17 02:01 ft20082

I have used bigint timestamp columns with these UDFs before and it works. I don't think that is the issue.

Would it be possible to create a simple Hive script to generate a table, add a few records, and verify that this issue persists? Once there is a replicable failure, we can fix the issue.

joshwalters avatar Jan 05 '17 20:01 joshwalters

I have faced the same issue. Sorting the dataset in inner query before calling funnel gave right result. Seems like somewhere sorting is making difference. I am able to reproduce this bug but not able to extract smaller dataset for the case. Something not so direct.

RameshByndoor avatar Apr 23 '17 10:04 RameshByndoor