aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Added `compress` option to `verdi storage maintain`

Open zhubonan opened this issue 2 years ago • 8 comments

Enable compression by default for verdi storage maintain. At moment the storage backend (disk-objectstore) has compression configured, but not used by default during the maintainance operation.

This is because pack_all_loose defaults to not compress the data. This PR addes the option to compress the data when doing verdi storage maintain, and makes compressing the data the default.

Since at the moment disk-objectstore cannot retrospectively compress the data (e.g. with repack), once they make the way to the pack files. Calling pack_all_loose(compress=True) is the only way to have the data compressed.

zhubonan avatar May 24 '22 11:05 zhubonan

Hey @zhubonan , thanks for the contribution! I can take a better look at the code later, but I think that we need to discuss one general aspect before that.

The interface of verdi storage maintain has to remain backend agnostic. Although the concept of compressing could be considered to be general to any storage solution (since it is a "property" of the data content itself), I'm thinking the implementation detail of this having to be clarified during the packing procedure is not. For example, if another backend allowed a per-object specification of compression that was handled at the time of storing the object, and would not be changeable during maintenance.

I also have some reservations about having this be the default, but I guess there is no point in discussing such details until we figure out the general issue. I do believe this would be a very interesting feature to enable, I'm just not sure that adding a direct and general option to verdi storage maintain is the best option.

Let me know what you think; in the meantime I'm also pinging @sphuber and @giovannipizzi for feedback and ideas.

ramirezfranciscof avatar May 27 '22 13:05 ramirezfranciscof

Idea: for the disk-objectstore perhaps it would be better to have an option in that storage backend section of the config that is called compress_when_packing, and check that value inside of the DiskObjectStoreRepositoryBackend.maintain method.

ramirezfranciscof avatar May 27 '22 13:05 ramirezfranciscof

I see. For backend not supporting compression can they just ignore the supplied option? I don't have any strong opinion on what the default should be, just that I think this functionality should be exposed to the user.

Since each backend would can handle things differently, rather than having a agnostic interface, may be the cmd line should be defined by each backend themselves (like verdi computer configure)?

zhubonan avatar May 27 '22 14:05 zhubonan

Since each backend would can handle things differently, rather than having a agnostic interface, may be the cmd line should be defined by each backend themselves (like verdi computer configure)?

I added some functionality last week to make this common pattern more easy to use. It is a command group subclass that automatically loads its subcommands based on registered entry points in some category, as for verdi computer configure does for the various transports. This time it is used for verdi code create. It is not a perfect solution though. Currently the entry point has to be a subcommand for it to work. You cannot have a single command, and have that have dynamic parameters based on the value of one of its arguments. Maybe that could be done, but then basic things like command --help wouldn't be able to show the correct help strings. Anyway, something to think about and see if it could be of use here indeed.

sphuber avatar May 27 '22 18:05 sphuber

Thanks,

  • I also think that having some variant per backend is useful/needed, as there will always be backend-specific actions one might want to do
  • I'm not sure that compressing by default is a good idea. Maybe better to implement the possibility to repack changing the compression type directly in the object store is better? This might fit well with the work that @zhubonan is doing on the "sealing" of packs as ZIP

giovannipizzi avatar May 27 '22 19:05 giovannipizzi

one thing we can do, however, is to have a verdi config option per profile, where the user can set if they want to compress when packing, and keep it off by default for now - what do you think?

giovannipizzi avatar May 27 '22 19:05 giovannipizzi

I'm not sure that compressing by default is a good idea.

Why @giovannipizzi? I think most people will just never use it if it's not on by default, and then you lose its benefits

chrisjsewell avatar May 27 '22 19:05 chrisjsewell

Perhaps I should keep uncompressed the default, while still making the option avalaible? With the new "searling" function one can compress packed objects that are not already compressed.

I'm not sure that compressing by default is a good idea. Maybe better to implement the possibility to repack changing the compression type directly in the object store is better?

I agree, these options should be made avalaible.

zhubonan avatar Jun 02 '22 10:06 zhubonan