hex_core icon indicating copy to clipboard operation
hex_core copied to clipboard

hex_tarball:unpack/2, create/2 API changes

Open wojtekmach opened this issue 6 years ago • 7 comments
trafficstars

Currently the API is:

-type contents() :: #{filename() => binary()}.
-type tarball() :: binary().
%% ...

-spec unpack(tarball(), memory) ->
                {ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} |
                {error, term()};
            (tarball(), filename()) ->
                {ok, #{checksum => checksum(), metadata => metadata()}} |
                {error, term()}.

so we are only able to accept binaries. I'm wondering if we should accept filenames too, so instead of binary() we accept {binary, binary()} | filename:filename() and deprecate the former? Eventually, at say v1.0, we'd accept {binary, binary()} | filename:filename_all().

This is a low-level library so I'm ok with pretty low level interface but I think this change would make it more convenient to use and I wish I'd have started with that.

Thoughts? cc @ferd @tsloughter

wojtekmach avatar May 28 '19 09:05 wojtekmach

why not add unpack_file ? It would allow you to carry the intent ("the binary blob is actually a filename") in the function name itself, rather than asking the user to wrap the type in a tuple ("this is a filename binary, and this is a binary binary")

Otherwise I'd expect consistency: unpack({file, Name} | {binary, Bin}, ...)

ferd avatar May 28 '19 11:05 ferd

Otherwise I'd expect consistency: unpack({file, Name} | {binary, Bin}, ...)

I was going for similar API to erl_tar:

extract(Name) -> RetValue
Name = filename() | {binary,binary()} | {file,Fd}

but I think I prefer your suggestion of unpack_file (and, consequently, an unpack_docs_file) better. Thanks 👍

wojtekmach avatar May 28 '19 11:05 wojtekmach

Somewhat related, we allow:

  1. hex_tarball:create(Metadata, ["foo.erl"]).
  2. hex_tarball:create(Metadata, [{"foo.erl", "/path/on/disk/to/foo.erl"}]).
  3. hex_tarball:create(Metadata, [{"foo.erl", <<"-module(foo).">>}]).

Option 1. is a convenience inherited from erl_tar and I'd be ok with dropping it to keep api surface smaller.

@starbelly suggested extracting option 3. into a separate function, it could have the following signature:

-spec create_from_memory(metadata(), [{filename:filename_all(), binary()}]).

If we put all this together, do we have:

  1. unpack, unpack_file, create, create_from_memory
  2. unpack, unpack_from_memory, create, create_from_memory
  3. unpack_from_file, unpack_from_memory, create_from_files, create_from_memory
  4. ???

I think I like option 2 the most. Thoughts?

One more thing, as mentioned in the initial comment we allow:

  1. unpack(Tarball, memory)
  2. unpack(Tarball, OutputPath)

That is, we allow unpacking to a file on disk or we return it from the function. We currently does not have such capability for create, we always return the generated tarball as binary, we don't save it on disk. Should we mimic the memory option there, or wait until someone actually needs it? If we're doing big changes to the API I'd prefer to them at once I think.

wojtekmach avatar Dec 16 '19 00:12 wojtekmach

@wojtekmach I too like option 2. I also like being able to pass the output path in or an atom. Though I can also see the argument for having unpack_from_memory/1 as it is one less argument to get wrong.

starbelly avatar Dec 16 '19 01:12 starbelly

oh, so unpack_from_memory/1,2? Yes, that's an option too, basically instead of:

-spec unpack(tarball(), memory) ->
                {ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} |
                {error, term()};
            (tarball(), filename()) ->
                {ok, #{checksum => checksum(), metadata => metadata()}} |
                {error, term()}.

we have:

-spec unpack(tarball()) ->
                {ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} |
                {error, term()}.
-spec unpack(tarball(), filename()) ->
                {ok, #{checksum => checksum(), metadata => metadata()}} |
                {error, term()}.

right? I guess that's a win, I kinda like that we don't have 2 different sets of returns. But then again, we actually would have:

-spec unpack(filename())
-spec unpack(filename(), filename())
-spec unpack_from_memory(binary())
-spec unpack_from_memory(binary(), filename())

does that look good? I think it's a little bit weird that in the function name we describe the input but not the output, is it? but then again calling it unpack_from_memory_to_memory would be terrible.

wojtekmach avatar Dec 16 '19 01:12 wojtekmach

I see what you mean in regards to symmetry and that is a con. It's definitely a pick your poison situation :) As for the rationale of one less argument to pass around, this is true, but I don't actually see it being a problem. So, I think if we put that argument aside, which one is simpler for the user? I'm not entirely sure, I tend to favor being explicit though. If we may ever have a need for yet another variation unpack or create then I would probably in favor of arguments vs function names, but I can't think of another variation 🤔

Note, that all approaches leads to two entry points for create* for me so that the validation parts are nice and clear 👍

starbelly avatar Dec 16 '19 02:12 starbelly

@wojtekmach Something I'm doing right now as part of the validation PR is definitely making sense on my end and that is on topic with this issue perhaps is ...

Instead of passing around 2 arguments here and one here, and a list of tuples here, etc... I'm modifying the internal functions to work with a package() :

 -type source() :: filesystem | memory.
 -type package() :: #{ source => source(),
                       files => files(),
                       metadata => metadata(),
                       errors => errors(),
                       warnings => warnings(),
                       content_size => integer()
                    }.

This is a map vs a record for interoperability. I'm wondering if we're doing an API change, maybe expose this and helper functions for working with it, such as new_package/2 or new_package_from_filesystem|memory/2 or new_package/3 which would take a metadata and files params or in the case of new_package/3 those two params and a filesystem|memory atom.

Operations on the package within hex_tarball would be opaque via the helper, even in internal functions (e.g., updating the files, adding errors or warnings, etc.).

Entry point functions could either return a package() (which would contain the tarball()) or just return a tarball() .

In addition, intentions (if needed) can be placed in the structure and the execution carried out by internal functions. This also would keeps things open and extensible for the future as well as making the implementation more clear and hopefully more declarative. I think even if we don't accept a package() at the entry points it still makes sense to work with this internally within hex_tarball.

To note on another thing I kind of threw in the mix here : filesystem vs files. In both the case of creating or unpacking to/from files, we're still working with files, so maybe filesystem is a better way to distinguish the source.

Sorry to add entropy to the convo here, but I figured since we're throwing out ideas and this is making sense on my end, throw it in :)

starbelly avatar Dec 16 '19 03:12 starbelly