avro_ex icon indicating copy to clipboard operation
avro_ex copied to clipboard

[WIP] Add encoder/decoder for Avro object container files

Open veedo opened this issue 3 years ago • 10 comments

For some of my projects, I need full control of the encoding/decoding process and AvroEx provides a good basis for that. The only things missing from AvroEx that I use a lot are object containers and cloud wire formats. This pull request is my attempt to add object containers in a flexible way that gives a user maximum control over the process.

Theory of operation:

  • Each part of the container can be encoded/decoded separately
  • Codec implementation can be supplied by the user
    • Currently using the :avro_ex app config so that they can configure it in their mix config
    • The snappy implementation is provided Without the underlying snappyer because snappy compression does something weird in Avro, but adding snappyer as a dependency adds a NIF compile requirement which is undesirable for cross compilation
    • I believe compression codecs are only used in Object Containers, so I placed them under that module. Let me know if that is not the case.
  • Just return the encoded data, let the user decide how they want to write the file
    • Not yet sure how well decoding will work for this concept, might be forced to use IO objects
    • Could be solved by providing functions for figuring out how much data to read for each chunk?

Please provide feedback on the PR as I go in case there's something untenable

veedo avatar Aug 21 '22 19:08 veedo

Thank you for the PR! I will provide some direct feedback this week. One immediate piece of feedback is to move all codecs to separate files

davydog187 avatar Aug 21 '22 20:08 davydog187

Took a quick look at this, here is some general feedback:

  1. To keep the API surface area of AvroEx as small as possible, I would suggest that we have top-level APIs for working with Object Container Files in AvroEx. We can have the implementation delegate out to the AvroEx.ObjectContainer module
  2. There was mention of using application configuration for the codec. I would advise against this and instead just allow the user to pass a keyword argument to the library, if the user of the library wants to use Application config let them do that in their own application. See the Elixir library guidelines

davydog187 avatar Aug 22 '22 15:08 davydog187

Took a quick look at this, here is some general feedback:

  1. To keep the API surface area of AvroEx as small as possible, I would suggest that we have top-level APIs for working with Object Container Files in AvroEx. We can have the implementation delegate out to the AvroEx.ObjectContainer module

will do :+1:

  1. There was mention of using application configuration for the codec. I would advise against this and instead just allow the user to pass a keyword argument to the library, if the user of the library wants to use Application config let them do that in their own application. See the Elixir library guidelines

Thanks, that is a useful document. Right now the codec is passed in anyways, i'll just have to think about how the name+implementation will work. I'll probably just add a name/0 function to the behaviour

veedo avatar Aug 22 '22 17:08 veedo

hello @veedo, just wanted to check in on this PR, are you waiting on any review from me, or still a WIP?

davydog187 avatar Sep 17 '22 13:09 davydog187

I've just been swamped this month unfortunately. I'll probably make some progress next week/weekend though 😅

veedo avatar Sep 17 '22 18:09 veedo

No rush! Just wanted to make sure you weren't waiting on me

davydog187 avatar Sep 17 '22 21:09 davydog187

Hello @veedo! Checking back in here, is there anything I can do to help with this PR? If you're not going to come back to it, we can consider other options

davydog187 avatar Feb 17 '23 15:02 davydog187

The swamping has continued unfortunately, and will probably continue for the next 2 months at least.

Currently the encoding part works correctly and consistently. My plan was to finish all the tests and start on decoding. I can split the encoding part out into its own PR and remove all the decoding parts, but that may be a bit weird for a user expecting both.

I'll whip myself to finish the encoding tests this weekend. How would you like to handle it? Splitting shouldn't be too much more work.

veedo avatar Feb 25 '23 17:02 veedo

@davydog187 Disregard my last comment. I had forgotten how much was left to do. The encoding and decoding works, I just need to do the documentation. I'll see if I can get it into an understandable state today.

veedo avatar Mar 05 '23 16:03 veedo

@veedo wanted to remind you about this PR! Its come so far I'd love to still be able to get this merged at some point

davydog187 avatar Jul 02 '24 14:07 davydog187