python-irodsclient icon indicating copy to clipboard operation
python-irodsclient copied to clipboard

Session.data_objects.put() overwrites without FORCE_FLAG_KW

Open anderbubble opened this issue 4 years ago • 11 comments

I think that Session.data_objects.put() should refuse to overwrite an existing irods object unless FORCE_FLAG_KW is set; but as currently implemented it overwrites regardless.

anderbubble avatar Nov 30 '21 04:11 anderbubble

I see strong argument for each option as the default.

@d-w-moore please dig through the time machine and look for any reasoning/history to this current decision/default of overwriting without the force flag (aka, always overwriting)... can we chalk it up to 'normal streaming behavior/assumptions' clashing with the semantics of .put()?

changing this default would/could be very disruptive to existing scripts/apps.

trel avatar Nov 30 '21 04:11 trel

I'm pretty sure I had to specify FORCE_FLAG_KW somewhere else; otherwise I wouldn't've even found it. I'll try to dig up the precedent that started this.

anderbubble avatar Nov 30 '21 04:11 anderbubble

Perhaps the option of respecting FORCE_FLAG_KW should be configurable as part of the environment, and session objects could inherit it. I'll dig , and look around for best options / semantics. I am inclined to not want to break established behavior, but if say an irods_respect_force_flag were enabled the PRC could then behave differently.

d-w-moore avatar Nov 30 '21 12:11 d-w-moore

I found my precedent: it's that get() requires force, just like iget, when the local file already exists.

https://github.com/irods/python-irodsclient/blob/e07aa8c102abfe19cc3524485723f6d2018e7914/irods/manager/data_object_manager.py#L84-L86

But put() does not, even though iput does (according to its manpage; see https://docs.irods.org/master/icommands/user/#iput).

Tests for python-irodsclient also indicate that force is required during create().

https://github.com/irods/python-irodsclient/blob/c168fcc7df4d9c5f2d849ac56e26eed31dc1c701/irods/test/force_create.py#L19-L46

From these examples, I say that the current behavior is inconsistent with the icommands and within the API. In any case, the API should be made consistent with itself; and I suggest that in doing so the API should be made consistent with the icommands.

anderbubble avatar Nov 30 '21 15:11 anderbubble

I see. So maybe what we need is a force_flag_consistency setting. If unset or None/empty, behavior is old-style (force flag only respected on get). If True, always respect the FORCE_FLAG_KW, and if False, then never do.

Does that sound reasonable?

d-w-moore avatar Nov 30 '21 17:11 d-w-moore

seems... good. agreed that the icommands have had the most eyes/hands and should be followed.

perhaps enforce_force_flag_consistency=false...

and we can then discuss whether to flip the default on that setting for python-irodsclient 2.0.0...

are there other places to check? the new parallel code?

trel avatar Nov 30 '21 17:11 trel

I believe as much as possible is done prior to the async/synchronous transfer branch. (Note to self to check ... @d-w-moore)

d-w-moore avatar Nov 30 '21 17:11 d-w-moore

Does that sound reasonable?

Seems reasonable to me.

anderbubble avatar Dec 01 '21 04:12 anderbubble

I'm just now realizing that I was overlooking this bit in the test:

self.skipTest('force flag unneeded for create in iRODS > 4.2.8')

So it's just get() that requires force right now, not create(). At least, as far as I can tell.

anderbubble avatar Dec 01 '21 04:12 anderbubble

@d-w-moore should probably eyeball jargon's default behavior as well... make a little truth table...

trel avatar Dec 01 '21 04:12 trel

From team discussion with @korydraughn , @trel , and @alanking earlier today it appears that the iRODS server used to enforce the FORCE_FLAG_KW in the PUT api; since 4.2.9 changes, however, even setting the OprType flag to 1 (PUT) in the data obj OPEN is no longer going to cause the python client (PRC) to hit the PUT api or policy. Furthermore, multi-1247 will complicate the semantics of this thing. Currently the PRC goes through an open/write/close sequence when put( ) is called; this is also roughly true of the PRC's PTE (Parallel Transfer Engine) impl. Maybe what's needed -- as much for enforcing FORCE_FLAG_KW as for the semantic tying-together of (N > 1) different Open operations on the parallel streams/threads -- is to make a higher level API (callable from the Pure Python / XML-irodsProt client endpoint, mind you...) that time-wise brackets and functionality-wise encapsulates all of the lower-level operations on these 1-or-more streams. This higher level thing might functionally force the whole shebang to be recognized as a PUT operation, or might be roughly equivalent or ancillary to the recently proposed prepare_to_receive API. It's a complex topic with many parts and needs further discussion.

d-w-moore avatar Dec 02 '21 14:12 d-w-moore