aiida-core
aiida-core copied to clipboard
Added `compress` option to `verdi storage maintain`
Enable compression by default for verdi storage maintain
.
At moment the storage backend (disk-objectstore) has compression configured, but not used by default during the maintainance operation.
This is because pack_all_loose
defaults to not compress the data. This PR addes the option to compress the data when doing verdi storage maintain
, and makes compressing the data the default.
Since at the moment disk-objectstore
cannot retrospectively compress the data (e.g. with repack
), once they make the way to the pack files. Calling pack_all_loose(compress=True)
is the only way to have the data compressed.
Hey @zhubonan , thanks for the contribution! I can take a better look at the code later, but I think that we need to discuss one general aspect before that.
The interface of verdi storage maintain
has to remain backend agnostic. Although the concept of compressing could be considered to be general to any storage solution (since it is a "property" of the data content itself), I'm thinking the implementation detail of this having to be clarified during the packing procedure is not. For example, if another backend allowed a per-object specification of compression that was handled at the time of storing the object, and would not be changeable during maintenance.
I also have some reservations about having this be the default, but I guess there is no point in discussing such details until we figure out the general issue. I do believe this would be a very interesting feature to enable, I'm just not sure that adding a direct and general option to verdi storage maintain
is the best option.
Let me know what you think; in the meantime I'm also pinging @sphuber and @giovannipizzi for feedback and ideas.
Idea: for the disk-objectstore
perhaps it would be better to have an option in that storage backend section of the config that is called compress_when_packing
, and check that value inside of the DiskObjectStoreRepositoryBackend.maintain
method.
I see. For backend not supporting compression can they just ignore the supplied option? I don't have any strong opinion on what the default should be, just that I think this functionality should be exposed to the user.
Since each backend would can handle things differently, rather than having a agnostic interface, may be the cmd line should be defined by each backend themselves (like verdi computer configure)?
Since each backend would can handle things differently, rather than having a agnostic interface, may be the cmd line should be defined by each backend themselves (like verdi computer configure)?
I added some functionality last week to make this common pattern more easy to use. It is a command group subclass that automatically loads its subcommands based on registered entry points in some category, as for verdi computer configure
does for the various transports. This time it is used for verdi code create
. It is not a perfect solution though. Currently the entry point has to be a subcommand for it to work. You cannot have a single command, and have that have dynamic parameters based on the value of one of its arguments. Maybe that could be done, but then basic things like command --help
wouldn't be able to show the correct help strings. Anyway, something to think about and see if it could be of use here indeed.
Thanks,
- I also think that having some variant per backend is useful/needed, as there will always be backend-specific actions one might want to do
- I'm not sure that compressing by default is a good idea. Maybe better to implement the possibility to repack changing the compression type directly in the object store is better? This might fit well with the work that @zhubonan is doing on the "sealing" of packs as ZIP
one thing we can do, however, is to have a verdi config
option per profile, where the user can set if they want to compress when packing, and keep it off by default for now - what do you think?
I'm not sure that compressing by default is a good idea.
Why @giovannipizzi? I think most people will just never use it if it's not on by default, and then you lose its benefits
Perhaps I should keep uncompressed the default, while still making the option avalaible? With the new "searling" function one can compress packed objects that are not already compressed.
I'm not sure that compressing by default is a good idea. Maybe better to implement the possibility to repack changing the compression type directly in the object store is better?
I agree, these options should be made avalaible.