cascading-cassandra icon indicating copy to clipboard operation
cascading-cassandra copied to clipboard

Sink Skip Task Bug & JSON Deserialisation to String

Open matlarsen opened this issue 11 years ago • 8 comments

  • Changed to prevent killing task on last modified
  • added JSON deserialisation to String
  • added dependancy for JSON jar

matlarsen avatar Sep 06 '13 17:09 matlarsen

This is my first ever pull request, apologies if it's crap!

matlarsen avatar Sep 06 '13 17:09 matlarsen

hi mat,

i have some issues :

  • JSON serializing is client dependent : there are a whole bunch of different JSON libs, and cascading-cassandra shouldn't force any specific one
  • getModifiedTime is used differently by sink/source : the value 0 has special meaning for sources (BaseFlow/getSourceModified) where it represents a resource which doesn't exist : i don't think we should return 0 for getModifiedTime when the Tap is sourcing, since it seems quite likely to break or change something else
  • there should be a unit test which fails without the patch, and passes with the patch : i still don't understand when this problem occurs or otherwise : if i run a (hadoop) cascalog job which sinks to cascading-cassandra under a debugger, the CassandraTap/getModifiedTime method doesn't seem to get called at all. no doubt i don't understand because i don't know cascalog/cascading's internal mechanics well enough (the effects of local/hadoop java/cascalog etc), but in the absence of that understanding i would like a test which makes some sense of the complexity

mccraigmccraig avatar Sep 09 '13 20:09 mccraigmccraig

tho maybe @ifesdjeen knows the cascading mechanics better than i do, and is happy with the change, in which case i'm happy to defer to his superior knowledge :)

mccraigmccraig avatar Sep 09 '13 20:09 mccraigmccraig

I do think we should add custom pluggable serialisers, but I do not think it's a good idea to bring in another dependency... maybe we could add a configurable fallback that would receive an object and return bytebuffer, that way everyone can roll out his own deserialisation mechanism?

ifesdjeen avatar Sep 09 '13 21:09 ifesdjeen

I would really really really appreciate a piece of Java code that reveals the problem though :) I would port it to test myself and we all will be happy :)

Let's make it that way: we roll out custom serialization via config, and @matlarsen shows us Java code that causes the problem in return. Deal? :)

ifesdjeen avatar Sep 09 '13 21:09 ifesdjeen

sounds like a good deal to me :)

mccraigmccraig avatar Sep 09 '13 21:09 mccraigmccraig

Hey

Agree with the JSON thing. Should never have put it in there.

I agree it should change depending on whether sink/source. I didn't realise 0 had a special meaning (makes sense it means missing though). I can't think of a sensible educated value to set it to however, 1? :o)

It doesn't get called at all for you? I don't understand what's going on then; it seems to me from my digging around the Cascading code that it must be called (ergo, it is checking if sink age > source age). The only case where it isn't (referring to the source here) is when the Run ID is null - maybe it has something to do with the nature of how cascading jobs are executed? I need to have a look at the planner code to try and find out why - I don't have the time to do this at the moment. When I get time I'll see if I can work my job down to a core path that causes it to fail (sorry atm it's epicly mad and I have to be sloppy) and try to nail that SOB in the planner.

Thanks for your comments and allowing me to contribute!

matlarsen avatar Sep 09 '13 23:09 matlarsen

the special meaning of 0 for sources doesn't matter if we only set the value to 0 when sinking

getSourceModified() does not appear to be called at all for me : i've no idea why : it is possible that i have a problem with my debugging setup, though other breakpoints in code from the same jar are working fine... it's possible that my source version is wrong though : i'll dig deeper when i've got some time

mccraigmccraig avatar Sep 11 '13 09:09 mccraigmccraig