sampleclean-async icon indicating copy to clipboard operation
sampleclean-async copied to clipboard

Crowd Context

Open sjyk opened this issue 9 years ago • 2 comments

Include other cols in the task.

sjyk avatar Aug 26 '15 04:08 sjyk

This is actually hard to do, since the current code applies a distinct count first and then runs attrdedup

sjyk avatar Aug 27 '15 05:08 sjyk

Hm. Could we rewrite the initial count distinct query as a group by?

e.g. SELECT name, first(col1), first(col2), ... FROM t GROUP BY name

This requires spark SQL to have a first aggregate, or some other way of getting a value out of the group.

thisisdhaas avatar Aug 27 '15 07:08 thisisdhaas