irods icon indicating copy to clipboard operation
irods copied to clipboard

Replica in cache marked good after stage-to-cache returns an error

Open JustinKyleJames opened this issue 2 years ago • 1 comments

  • [ ] mian
  • [ ] 4-3-stable
  • [ ] 4-2-stable

Bug Report

When the stage-to-cache operation returns an error, the replica is still created in cache and marked good.

iRODS Version, OS and Version

iRODS 4.3.0

Tested on Ubuntu 18.04.

What did you try to do?

I created a compound resource with a unixfilesystem cache and S3 archive. The S3 was configured to send data to Glacier. Because of this, the data can't be immediately retrieved from S3 but must have a restoration scheduled. During stage-to-cache, the S3 plugin schedules the restoration and returns REPLICA_IS_BEING_STAGED error.

Expected behavior

I expected the error returned from stage-to-cache to cause either the replica on cache to not be created in the DB or to be marked as stale.

Observed behavior (including steps to reproduce, if applicable)

The replica was created in the database and marked good. There happens to be an empty file on the filesystem for the cache replica. Therefore, subsequent retrievals will return an empty file rather than attempting to get the file from S3 again (which may work if the restoration has been completed).

To recover from this a person would need to know to delete the cache replica.

JustinKyleJames avatar Jul 27 '22 15:07 JustinKyleJames

I can confirm that this bug is present also in 4.2.11.

ciklysta avatar Sep 01 '22 14:09 ciklysta

I just double checked that this is still an issue with both 4.2.11 and 4.3.0.

Reproducing the error:

First, here is the S3 archive configuration showing the S3_STORAGE_CLASS and S3_RESTORATION_TIER.

irods@e8284aad29be:~$ iadmin lr s3_archive_resc
resc_id: 32838
resc_name: s3_archive_resc
zone_name: tempZone
resc_type_name: s3
resc_net: e8284aad29be
resc_def_path: /justinkylejames-irods1/amazons3resc
free_space: 
free_space_ts: Never
resc_info: 
r_comment: 
resc_status: 
create_ts: 2022-12-14.17:02:39
modify_ts: 2022-12-14.17:03:44
resc_children: 
resc_context: S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/amazon.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=2;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;S3_ENABLE_MD5=1;S3_SIGNATURE_VERSION=2;S3_ENABLE_MPU=1;S3_MPU_THREADS=30;S3_MPU_CHUNK=256;S3_STORAGE_CLASS=Glacier;S3_RESTORATION_TIER=Expedited
resc_parent: 32839
resc_parent_context: archive

Putting the file, trimming it from cache, then trying to read it.

irods@e8284aad29be:~$ iput -R comp_resc VERSION.json

irods@e8284aad29be:~$ ils -L
/tempZone/home/rods:
  rods              0 comp_resc;s3_cache_resc          225 2022-12-14.17:04 & VERSION.json
        generic    /var/lib/irods/s3_cache_resc/home/rods/VERSION.json
  rods              1 comp_resc;s3_archive_resc          225 2022-12-14.17:04 & VERSION.json
        generic    /justinkylejames-irods1/amazons3resc/home/rods/VERSION.json

irods@e8284aad29be:~$ itrim -N 1 -n 0 VERSION.json
Total size trimmed = 0.000 MB. Number of files trimmed = 1.

irods@e8284aad29be:~$ iget VERSION.json -
remote addresses: 10.15.0.3 ERROR: getUtil: get error for - status = -1803000 HIERARCHY_ERROR
Level 0: [-]    /github/irods_resource_plugin_s3/s3/s3_transport/src/s3_transport.cpp:206:irods::error irods::experimental::io::s3_transport::restore_s3_object(const std::string &, libs3_types::bucket_context &, const unsigned int, const std::string &, const std::string &) :  status [REPLICA_IS_BEING_STAGED]  errno [] -- message [Object is in GLACIER and has been queued for restoration.  Try again later.]


Level 1: [-]    /github/irods_resource_plugin_s3/s3/s3_transport/src/s3_transport.cpp:129:irods::error irods::experimental::io::s3_transport::handle_glacier_status(const std::string &, libs3_types::bucket_context &, const unsigned int, const std::string &, irods::experimental::io::s3_transport::object_s3_status, const std::string &) :  status [REPLICA_IS_BEING_STAGED]  errno [] -- message [Object is in GLACIER and is currently being restored.  Try again later.]


irods@e8284aad29be:~$ iget VERSION.json -

irods@e8284aad29be:~$ ils -L
/tempZone/home/rods:
  rods              1 comp_resc;s3_archive_resc          225 2022-12-14.17:04 & VERSION.json
        generic    /justinkylejames-irods1/amazons3resc/home/rods/VERSION.json
  rods              2 comp_resc;s3_cache_resc          225 2022-12-14.17:05 & VERSION.json
        generic    /var/lib/irods/s3_cache_resc/home/rods/VERSION.json

irods@e8284aad29be:~$ ls -l /var/lib/irods/s3_cache_resc/home/rods/VERSION.json
-rw------- 1 irods irods 0 Dec 14 17:05 /var/lib/irods/s3_cache_resc/home/rods/VERSION.json

Note that the read from S3 returned the error but the database was updated with a cache version but if you look at the actual cache file it has a size of 0. Because of this, the iget returns nothing but gives no error.

Here is the recovery process performed once the file is moved from Glacier. First, trim the file from cache then get it again. Note: To do this you must make sure the object isn't still in Glacier otherwise you will just reproduce the problem again.

irods@e8284aad29be:~$ itrim -N 1 -n 2 VERSION.json
Total size trimmed = 0.000 MB. Number of files trimmed = 1.

irods@e8284aad29be:~$ iget VERSION.json - 
{
    "catalog_schema_version": 8, 
    "commit_id": "bc6f9f1cdef6c4ec01ea14402428988892615321", 
    "configuration_schema_version": 3, 
    "installation_time": "2022-10-21T19:37:08.096378", 
    "irods_version": "4.2.11"
}

JustinKyleJames avatar Dec 14 '22 17:12 JustinKyleJames

@JustinKyleJames What's the status of this issue?

korydraughn avatar Mar 20 '23 22:03 korydraughn

@JustinKyleJames - Please close if finished. Thanks!

alanking avatar Apr 14 '23 17:04 alanking

Closing

JustinKyleJames avatar Apr 14 '23 18:04 JustinKyleJames

Will this fix be part of 4.3.1 release, please? It is missing from the 4.3.1. milestone. From the fact, that it is on 4-3-stable branch, I assume yes. Thank you.

ciklysta avatar Oct 04 '23 07:10 ciklysta

Yes. It was fixed in 4.2.12 and will be part of 4.3.1.

korydraughn avatar Oct 04 '23 11:10 korydraughn