python-irodsclient icon indicating copy to clipboard operation
python-irodsclient copied to clipboard

iRODSDataObject(...).modify_time is arbitrarily chosen

Open d-w-moore opened this issue 5 months ago • 13 comments

The various DataObject model attributes are chosen based on the first row in the query result-set, which is ordered by replica number. This affects the data object's "overall" timestamp info among other attributes:

session=irods.helpers.make_session( )
data_obj = session.data_objects.get( path_to_data_object )
print(repr(data_obj.modify_time))# -> this will not give the most recent replica's timestamp
max(r[DataObject.modify_time] for r in data_obj.replicas) # -> this will give most recent replica's timestamp

The printed datetime object will not necessarily reflect the most recent replica's modification timestamp.

For that we'd have to put in a hook to sort the result set before transferring the attributes to the main object. (I am thinking that is the most efficient option,and most backward compatible, since the PRC mostly relays cached attributes anyway and relies on the user to do fresh queries to re-poll the object for changes.)

d-w-moore avatar Jun 28 '25 09:06 d-w-moore

This will also affect the access_time attribute for iRODS 5 when that is ultimately added.

d-w-moore avatar Jun 28 '25 10:06 d-w-moore

So... we have talked about the canonical answer for a data object that actually is a per-replica bit of information... we've agreed that we should use the "latest, good replica" to report as the object information.

So, status=1 and the latest modify time...

trel avatar Jun 28 '25 13:06 trel

So... we have talked about the canonical answer for a data object that actually is a per-replica bit of information... we've agreed that we should use the "latest, good replica" to report as the object information.

So, status=1 and the latest modify time...

That should work

d-w-moore avatar Jun 28 '25 20:06 d-w-moore

However, access tiime doesn't necessarily correlate with the status flag. So perhaps access_time along with some other replica attributes should not appear in the data object itself

d-w-moore avatar Jun 28 '25 20:06 d-w-moore

hmm, default access_time should probably also be 'latest, good replica'.... is there a downside to that?

trel avatar Jun 28 '25 20:06 trel

I need to check into it. If access time is independent of modify time then the higher level abstraction (data object) can't always accurately reflect the latest of both. One replica may have been more recently read, another one more recently written, or opened for write, and both could still have good status. If my understanding is accurate. I'm which case these timestamps could(should) be properties.

d-w-moore avatar Jun 28 '25 23:06 d-w-moore

Or maybe I'm wrong? Is opening a replica for write enough to change the modify timestamp?

d-w-moore avatar Jun 28 '25 23:06 d-w-moore

i think they are indeed independent in the way you say. opening for write does not mean it wrote anything... i think we (should) key on data_modified somewhere...

trel avatar Jun 29 '25 02:06 trel

I believe the mtime is only updated when data is modified. The atime is only considered for updating if the replica is opened for reading.

I believe there's an argument for not doing anything here. As long as the replicas are available, developers can get the information they want. It's possible they may not want the mtime/atime for the latest good replica.

A better approach would be to document the current behavior and consider whether providing one or two free functions is good enough.

korydraughn avatar Jun 30 '25 13:06 korydraughn

I believe there's an argument for not doing anything here. As long as the replicas are available, developers can get the information they want. It's possible they may not want the mtime/atime for the latest good replica.

I think I'm fine with that. The only problem - and it's admittedly small but could be perplexing for beginners and annoying for practiced library users - is that the existence of these replica attributes on the main iRODSDataObject could be misleading. You have an access_time now advertised to you in the top level object, and like the modify_time it's a datetime object. But then you find out it's not necessarily accurate...

It's been an issue for a long time, but we have less of an excuse now and it's because we're adding the access_time analog to the existing modify_time attribute.

d-w-moore avatar Jun 30 '25 15:06 d-w-moore

One way to deal with that is to deprecate those members on iRODSDataObject. That's just an idea.

Alternatively, we do as you said and make modify_time and access_time properties with additional logic which handles sorting, etc on demand.

Play with the different schemes and weigh the pros/cons to determine which one aligns with the design of the PRC.

korydraughn avatar Jun 30 '25 17:06 korydraughn

Yes, could be that a mixture of approaches would be good. Some of the replica attributes are ok because they agree over all rows. Some like resc_name can of course be deprecated

One way to deal with that is to deprecate those members on iRODSDataObject. That's just an idea.

Alternatively, we do as you said and make modify_time and access_time properties with additional logic which handles sorting, etc on demand.

Play with the different schemes and weigh the pros/cons to determine which one aligns with the design of the PRC.

d-w-moore avatar Jun 30 '25 17:06 d-w-moore

Or rather resc_id can be deprecated, at the iRODSDataObject level.

d-w-moore avatar Jun 30 '25 17:06 d-w-moore