python-irodsclient
python-irodsclient copied to clipboard
Session.data_objects.put() overwrites without FORCE_FLAG_KW
I think that Session.data_objects.put() should refuse to overwrite an existing irods object unless FORCE_FLAG_KW is set; but as currently implemented it overwrites regardless.
I see strong argument for each option as the default.
@d-w-moore please dig through the time machine and look for any reasoning/history to this current decision/default of overwriting without the force flag (aka, always overwriting)... can we chalk it up to 'normal streaming behavior/assumptions' clashing with the semantics of .put()?
changing this default would/could be very disruptive to existing scripts/apps.
I'm pretty sure I had to specify FORCE_FLAG_KW somewhere else; otherwise I wouldn't've even found it. I'll try to dig up the precedent that started this.
Perhaps the option of respecting FORCE_FLAG_KW should be configurable as part of the environment, and session objects could inherit it. I'll dig , and look around for best options / semantics. I am inclined to not want to break established behavior, but if say an irods_respect_force_flag were enabled the PRC could then behave differently.
I found my precedent: it's that get() requires force, just like iget, when the local file already exists.
https://github.com/irods/python-irodsclient/blob/e07aa8c102abfe19cc3524485723f6d2018e7914/irods/manager/data_object_manager.py#L84-L86
But put() does not, even though iput does (according to its manpage; see https://docs.irods.org/master/icommands/user/#iput).
Tests for python-irodsclient also indicate that force is required during create().
https://github.com/irods/python-irodsclient/blob/c168fcc7df4d9c5f2d849ac56e26eed31dc1c701/irods/test/force_create.py#L19-L46
From these examples, I say that the current behavior is inconsistent with the icommands and within the API. In any case, the API should be made consistent with itself; and I suggest that in doing so the API should be made consistent with the icommands.
I see. So maybe what we need is a force_flag_consistency setting. If unset or None/empty, behavior is old-style (force flag only respected on get). If True, always respect the FORCE_FLAG_KW, and if False, then never do.
Does that sound reasonable?
seems... good. agreed that the icommands have had the most eyes/hands and should be followed.
perhaps enforce_force_flag_consistency=false...
and we can then discuss whether to flip the default on that setting for python-irodsclient 2.0.0...
are there other places to check? the new parallel code?
I believe as much as possible is done prior to the async/synchronous transfer branch. (Note to self to check ... @d-w-moore)
Does that sound reasonable?
Seems reasonable to me.
I'm just now realizing that I was overlooking this bit in the test:
self.skipTest('force flag unneeded for create in iRODS > 4.2.8')
So it's just get() that requires force right now, not create(). At least, as far as I can tell.
@d-w-moore should probably eyeball jargon's default behavior as well... make a little truth table...
From team discussion with @korydraughn , @trel , and @alanking earlier today it appears that the iRODS server used to enforce the FORCE_FLAG_KW in the PUT api; since 4.2.9 changes, however, even setting the OprType flag to 1 (PUT) in the data obj OPEN is no longer going to cause the python client (PRC) to hit the PUT api or policy. Furthermore, multi-1247 will complicate the semantics of this thing. Currently the PRC goes through an open/write/close sequence when put( ) is called; this is also roughly true of the PRC's PTE (Parallel Transfer Engine) impl. Maybe what's needed -- as much for enforcing FORCE_FLAG_KW as for the semantic tying-together of (N > 1) different Open operations on the parallel streams/threads -- is to make a higher level API (callable from the Pure Python / XML-irodsProt client endpoint, mind you...) that time-wise brackets and functionality-wise encapsulates all of the lower-level operations on these 1-or-more streams. This higher level thing might functionally force the whole shebang to be recognized as a PUT operation, or might be roughly equivalent or ancillary to the recently proposed prepare_to_receive API. It's a complex topic with many parts and needs further discussion.