cascalog icon indicating copy to clipboard operation
cascalog copied to clipboard

Schema Integration!

Open sritchie opened this issue 11 years ago • 3 comments

I'd love to see someone take on integration of Prismatic's Schema library with Cascalog. The ability to write schemafied operations, and have the Cascalog compiler validate schemas before submitting jobs, could avoid the runtime errors that are one of the only downsides to Cascalog :)

sritchie avatar Feb 14 '14 15:02 sritchie

I love your idea. Can you develop it or just share the use case you have in mind ?

I thought using thrift or any other serializer helps to enforce the schema.

Is it an alternative or they can play nicely together ?

In the same spirit as core.typed and schema "core.typed has accurate compile time checking, and Schema gives an expressive contracts interface for runtime checking." in https://news.ycombinator.com/item?id=6339607

Thanks,

maxrzepka avatar Mar 29 '14 05:03 maxrzepka

Yeah, for sure.

Thrift is nice for enforcing a schema when you write to disk - that safety kicks in when your job is running on the cluster. If you try to populate thrift objects with items of the wrong type, you'll get runtime exceptions after job submission. This is painful, and a big waste of time.

If Cascalog query definitions could use schema to check the input and output types of predicates, then Schema's "runtime" guarantee would prevent badly typed jobs from being submitted. So the runtime here is really a second compile time. This would play really nicely with Thrift.

sritchie avatar Mar 29 '14 14:03 sritchie

Thanks for the explanation. Sounds really exciting... Hope to get some time investigating it.

maxrzepka avatar Mar 29 '14 17:03 maxrzepka