irods
irods copied to clipboard
Replica in cache marked good after stage-to-cache returns an error
- [ ] mian
- [ ] 4-3-stable
- [ ] 4-2-stable
Bug Report
When the stage-to-cache operation returns an error, the replica is still created in cache and marked good.
iRODS Version, OS and Version
iRODS 4.3.0
Tested on Ubuntu 18.04.
What did you try to do?
I created a compound resource with a unixfilesystem cache and S3 archive. The S3 was configured to send data to Glacier. Because of this, the data can't be immediately retrieved from S3 but must have a restoration scheduled. During stage-to-cache, the S3 plugin schedules the restoration and returns REPLICA_IS_BEING_STAGED error.
Expected behavior
I expected the error returned from stage-to-cache to cause either the replica on cache to not be created in the DB or to be marked as stale.
Observed behavior (including steps to reproduce, if applicable)
The replica was created in the database and marked good. There happens to be an empty file on the filesystem for the cache replica. Therefore, subsequent retrievals will return an empty file rather than attempting to get the file from S3 again (which may work if the restoration has been completed).
To recover from this a person would need to know to delete the cache replica.
I can confirm that this bug is present also in 4.2.11
.
I just double checked that this is still an issue with both 4.2.11 and 4.3.0.
Reproducing the error:
First, here is the S3 archive configuration showing the S3_STORAGE_CLASS and S3_RESTORATION_TIER.
irods@e8284aad29be:~$ iadmin lr s3_archive_resc
resc_id: 32838
resc_name: s3_archive_resc
zone_name: tempZone
resc_type_name: s3
resc_net: e8284aad29be
resc_def_path: /justinkylejames-irods1/amazons3resc
free_space:
free_space_ts: Never
resc_info:
r_comment:
resc_status:
create_ts: 2022-12-14.17:02:39
modify_ts: 2022-12-14.17:03:44
resc_children:
resc_context: S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/amazon.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=2;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;S3_ENABLE_MD5=1;S3_SIGNATURE_VERSION=2;S3_ENABLE_MPU=1;S3_MPU_THREADS=30;S3_MPU_CHUNK=256;S3_STORAGE_CLASS=Glacier;S3_RESTORATION_TIER=Expedited
resc_parent: 32839
resc_parent_context: archive
Putting the file, trimming it from cache, then trying to read it.
irods@e8284aad29be:~$ iput -R comp_resc VERSION.json
irods@e8284aad29be:~$ ils -L
/tempZone/home/rods:
rods 0 comp_resc;s3_cache_resc 225 2022-12-14.17:04 & VERSION.json
generic /var/lib/irods/s3_cache_resc/home/rods/VERSION.json
rods 1 comp_resc;s3_archive_resc 225 2022-12-14.17:04 & VERSION.json
generic /justinkylejames-irods1/amazons3resc/home/rods/VERSION.json
irods@e8284aad29be:~$ itrim -N 1 -n 0 VERSION.json
Total size trimmed = 0.000 MB. Number of files trimmed = 1.
irods@e8284aad29be:~$ iget VERSION.json -
remote addresses: 10.15.0.3 ERROR: getUtil: get error for - status = -1803000 HIERARCHY_ERROR
Level 0: [-] /github/irods_resource_plugin_s3/s3/s3_transport/src/s3_transport.cpp:206:irods::error irods::experimental::io::s3_transport::restore_s3_object(const std::string &, libs3_types::bucket_context &, const unsigned int, const std::string &, const std::string &) : status [REPLICA_IS_BEING_STAGED] errno [] -- message [Object is in GLACIER and has been queued for restoration. Try again later.]
Level 1: [-] /github/irods_resource_plugin_s3/s3/s3_transport/src/s3_transport.cpp:129:irods::error irods::experimental::io::s3_transport::handle_glacier_status(const std::string &, libs3_types::bucket_context &, const unsigned int, const std::string &, irods::experimental::io::s3_transport::object_s3_status, const std::string &) : status [REPLICA_IS_BEING_STAGED] errno [] -- message [Object is in GLACIER and is currently being restored. Try again later.]
irods@e8284aad29be:~$ iget VERSION.json -
irods@e8284aad29be:~$ ils -L
/tempZone/home/rods:
rods 1 comp_resc;s3_archive_resc 225 2022-12-14.17:04 & VERSION.json
generic /justinkylejames-irods1/amazons3resc/home/rods/VERSION.json
rods 2 comp_resc;s3_cache_resc 225 2022-12-14.17:05 & VERSION.json
generic /var/lib/irods/s3_cache_resc/home/rods/VERSION.json
irods@e8284aad29be:~$ ls -l /var/lib/irods/s3_cache_resc/home/rods/VERSION.json
-rw------- 1 irods irods 0 Dec 14 17:05 /var/lib/irods/s3_cache_resc/home/rods/VERSION.json
Note that the read from S3 returned the error but the database was updated with a cache version but if you look at the actual cache file it has a size of 0. Because of this, the iget returns nothing but gives no error.
Here is the recovery process performed once the file is moved from Glacier. First, trim the file from cache then get it again. Note: To do this you must make sure the object isn't still in Glacier otherwise you will just reproduce the problem again.
irods@e8284aad29be:~$ itrim -N 1 -n 2 VERSION.json
Total size trimmed = 0.000 MB. Number of files trimmed = 1.
irods@e8284aad29be:~$ iget VERSION.json -
{
"catalog_schema_version": 8,
"commit_id": "bc6f9f1cdef6c4ec01ea14402428988892615321",
"configuration_schema_version": 3,
"installation_time": "2022-10-21T19:37:08.096378",
"irods_version": "4.2.11"
}
@JustinKyleJames What's the status of this issue?
@JustinKyleJames - Please close if finished. Thanks!
Closing
Will this fix be part of 4.3.1 release, please? It is missing from the 4.3.1. milestone. From the fact, that it is on 4-3-stable branch, I assume yes. Thank you.
Yes. It was fixed in 4.2.12 and will be part of 4.3.1.