log-synth icon indicating copy to clipboard operation
log-synth copied to clipboard

Problem flattening data

Open tdunning opened this issue 10 years ago • 1 comments

Hi Ted,

Here's the error I get:

$ ./synth -schema msg/clickstream.json id,user_id,program_id,timestamp,last_five_actions,action,device,br,la,st,os Exception in thread "main" java.lang.IllegalArgumentException: Cannot flatten type class com.fasterxml.jackson.databind.node.ObjectNode at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:33) at com.mapr.synth.samplers.FlattenSampler.sample(FlattenSampler.java:12) at com.mapr.synth.samplers.SchemaSampler.sample(SchemaSampler.java:69) at com.mapr.synth.Synth.main(Synth.java:34)

Here's the schema, which is pretty much just lifted from the example in README.md, except for the fact I'm using the lookup class for the value of the "base" key:

[ {"name":"id", "class":"id"}, {"name":"user_id", "class": "foreign-key", "size": 100000 }, {"name":"program_id", "class": "foreign-key", "size": 125 }, {"name":"timestamp", "class": "date", "format": "yyyy-MM-dd HH:MM:ss.SS", "start": "2014-08-12 00:00:00.00" },

{"name":"last_five_actions", "class": "flatten", "value": { "class": "sequence", "length": 5, "base": { "class": "lookup", "file": "msg/actions.csv" } } },

{"name":"action", "class":"string", "dist":{ "play":21, "stop":19, "pause":16, "ff": 12, "rw": 7, "replay": 2} },

{"name":"device", "class":"string", "dist":{ "large":25, "phone":45, "tablet":25, "other": 5} },

{"name":"br", "class":"browser"}, {"name":"la", "class":"language"}, {"name":"st", "class":"state"}, {"name":"os", "class":"os"} ]

I'm proceeding without the lookup stuff, since that produces data that I think is good enough for my purpose.

tdunning avatar Sep 04 '14 06:09 tdunning

Vince,

I am taking a look at this again. What was the desired result. I am not quite clear what you wanted to happen. What flatten does is take a list of lists and removes one level of nesting. For a list of objects, that isn't so simple. One thought I had about what you might have wanted is that you might have wanted to have a single output record for each element of the sequence with all other fields in the record replicated identically. If so, the short walk to that result is to simply use python to reprocess the output.

Can you say more about what you want?

tdunning avatar Nov 09 '14 00:11 tdunning