sampleclean-async Crowd Context

Crowd Context

Open sjyk opened this issue 9 years ago • 2 comments

Include other cols in the task.

Aug 26 '15 04:08 sjyk

This is actually hard to do, since the current code applies a distinct count first and then runs attrdedup

Aug 27 '15 05:08 sjyk

Hm. Could we rewrite the initial count distinct query as a group by?

e.g. SELECT name, first(col1), first(col2), ... FROM t GROUP BY name

This requires spark SQL to have a first aggregate, or some other way of getting a value out of the group.

Aug 27 '15 07:08 thisisdhaas