metacatui
metacatui copied to clipboard
Inconsistency of metrics counts
Describe the bug We had a report from one of our users of an inconsistent behavior on the metrics shown on the search page vs metrics on the dataset view.
Please see screenshots below for reference.
Additional context: This may relate to data files that were previously removed or obsoleted by other data files.
Screenshots
Desktop (please complete the following information):
- OS: Ventura
- Browser: Chrome
@rushirajnenuji can you look into this discrepancy please?
One quick note -- I see that the search page on DataONE has the same stats as the landing page on ESS-DIVE (see screenshot), so it seems that it is the search results on ESS-DIVE that are providing different metrics.
Thanks for checking @mbjones - I did run into the same thing when revisiting the dataset on ESS-DIVE, and now the results looks correct and consistent, so I think something did change on the returned results. (Or possibly cached metrics?)
For context, I was able to replicate on my laptop this morning the user screenshots.
Hey @helbashandy, @mbjones -- thank you both for providing such a useful bug report.
The search catalog page and the dataset landing page are currently using the same pid_resolution algorithm. Ideally, the metrics for both should match. We do utilize Apache's caching functionality (with TTL 24 hours). (so it could be a cached response? not sure, I'll test it further)
I will investigate our pipeline to identify and resolve the root cause of this issue.
Thanks for checking @mbjones - I did run into the same thing when revisiting the dataset on ESS-DIVE, and now the results looks correct and consistent, so I think something did change on the returned results. (Or possibly cached metrics?)
For context, I was able to replicate on my laptop this morning the user screenshots.
FWIW I am seeing the incorrect metrics for the top dataset in this image.
As another data point, I am also still seeing the 713 count on the ESS-DIVE search page, even after reloading multiple times and clearing my browser cache, and trying on both firefox and safari, when I am both logged in with my orcid and when I am not.
As another data point, I am also still seeing the
713count on the ESS-DIVE search page, even after reloading multiple times and clearing my browser cache, and trying on both firefox and safari, when I am both logged in with my orcid and when I am not.
@helbashandy What version of MetacatUI are we on. @mbjones Could a difference in version be the issue?
Thanks @vchendrix and @mbjones - This is interesting, I realized that can see both scenarios happening on two different datasets at the same time, so it maybe haven't been fixed and I just happened to open a second WHONDRS dataset that does have consistent metrics. This I think scopes down the issue to the files changes on the datasets.
For reference, the datasets urls are: (Consistent metrics) - https://data.ess-dive.lbl.gov/view/doi%3A10.15485%2F1729719 (Inconsistent metrics) - https://data.ess-dive.lbl.gov/view/doi%3A10.15485%2F1603775
Below are the screenshots of the consistent dataset:
@vchendrix - I doubt it's related to the MetacatUI view, since it seems like the metrics request queries is identical (with only difference in PIDs), but for some reason the returned results on DataONE is different than the returned result on ESS-DIVE on the metrics service call.
# DataONE
{
"metricsPage": {
"total": 0,
"start": 0,
"count": 0
},
"metrics": [
"citations",
"downloads",
"views"
],
"filterBy": [
{
"filterType": "catalog",
"values": [
"ess-dive-7b94c94d81f095a-20230919T202114802034",
"ess-dive-6269f157183d70c-20230918T210530082",
"ess-dive-d3dc26585e68115-20230915T183958972",
"ess-dive-3531f1661cd538c-20230824T172525448",
"ess-dive-be3b782f1282664-20230808T202729908",
"ess-dive-906378ffe0774a3-20230808T202555206",
"ess-dive-10f72a8aff0d825-20230802T190322056",
"ess-dive-6269f157183d70c-20230713T164140304",
"ess-dive-d3dc26585e68115-20230713T163740417",
"ess-dive-28c4750aab75568-20230712T224454537",
"ess-dive-28c4750aab75568-20230707T222115571",
"ess-dive-28c4750aab75568-20230707T192247356",
"ess-dive-e976198fe417dbb-20230626T185837918",
"ess-dive-d200b2f70eb970d-20230509T161958141",
"ess-dive-e976198fe417dbb-20230509T155407346",
"ess-dive-e7d877e9490d522-20230504T212355353421",
"ess-dive-2d07e9e9b2bb3f3-20230504T212247921492",
"ess-dive-fa597c973ec6c15-20230504T212115090038",
"ess-dive-0ba7ae0eb5b7573-20230504T211814681504",
"ess-dive-38c10d65ab0ad5e-20230504T211806852833",
"ess-dive-d6d47afba2a6603-20230504T211432460626",
"ess-dive-41e801b10b1d984-20230412T224134738",
"ess-dive-6efcd2d381c7626-20230410T210106765",
"ess-dive-ebff53121684912-20230408T012101811",
"ess-dive-49816505e59a096-20230408T011940128"
],
"interpretAs": "list"
},
{
"filterType": "month",
"values": [
"01/01/2012",
"11/17/2023"
],
"interpretAs": "range"
}
],
"groupBy": [
"month"
]
},
# ESS-DIVE
{
"metricsPage": {
"total": 0,
"start": 0,
"count": 0
},
"metrics": [
"citations",
"downloads",
"views"
],
"filterBy": [
{
"filterType": "catalog",
"values": [
"ess-dive-28c4750aab75568-20231108T174345678",
"ess-dive-d3dc26585e68115-20230929T171354410",
"ess-dive-7b94c94d81f095a-20230919T202114802034",
"ess-dive-6269f157183d70c-20230918T210530082",
"ess-dive-c20ddbe8049415d-20230913T163558486",
"ess-dive-60f090515e619d4-20230808T210228079",
"ess-dive-b8adda907e115d7-20230808T205911246",
"ess-dive-a0edda19aacd17b-20230808T202837779",
"ess-dive-be3b782f1282664-20230808T202729908",
"ess-dive-906378ffe0774a3-20230808T202555206",
"ess-dive-10f72a8aff0d825-20230802T190322056",
"ess-dive-e976198fe417dbb-20230802T190226889",
"ess-dive-7e648cd2fef2975-20230509T162406643",
"ess-dive-7f96ff2396c97b2-20230509T162124774",
"ess-dive-d200b2f70eb970d-20230509T161958141",
"ess-dive-45987107f239c0f-20230509T160509560",
"ess-dive-e7d877e9490d522-20230504T212355353421",
"ess-dive-2d07e9e9b2bb3f3-20230504T212247921492",
"ess-dive-0ba7ae0eb5b7573-20230504T211814681504",
"ess-dive-6efcd2d381c7626-20230410T210106765",
"ess-dive-ebff53121684912-20230410T171949646",
"ess-dive-49816505e59a096-20230410T171850389",
"ess-dive-a1b48555ea1351b-20230410T171815180",
"ess-dive-55df3dafa77d7e2-20230410T171728835",
"ess-dive-44f252f38d00cd1-20230410T171638337"
],
"interpretAs": "list"
},
{
"filterType": "month",
"values": [
"01/01/2012",
"11/17/2023"
],
"interpretAs": "range"
}
],
"groupBy": [
"month"
]
}
@mbjones @rushirajnenuji - Another interesting notice is that the dataset on DataONE doesn't have one of the data files that is on ESS-DIVE, maybe there's a delayed update from the CN or so? thus, the difference.
Hi all, sharing some observations from my testing:
ESS DIVE behavior:
-
latest dataset version:
ess-dive-d3dc26585e68115-20230929t171354410 -
PID_resolution results:
"values": [
"ess-dive-d3dc26585e68115-20230929t171354410",
"ess-dive-d3dc26585e68115-20230929t171354410"
],
DataONE behavior:
-
latest dataset version:
ess-dive-d3dc26585e68115-20210722T195610978 -
PID_resolution results:
"values": [
"ess-dive-d3dc26585e68115-20210722T195610978",
"ess-dive-d3dc26585e68115-20210722T195610978",
"ess-dive-5594edd2ef04b99-20210721T142728182",
"ess-dive-d3dc26585e68115-20210721T144541687",
"ess-dive-8579631061a07ce-20210722T195610962",
"doi:10.15485/1603775",
"ess-dive-3b724d6bdb052b0-20200309T183829430",
"ess-dive-2cfa657544d8e08-20200306T031946773",
"ess-dive-6a50c611c8833d6-20200306T184249151",
"ess-dive-f808c35e332caaf-20200309T183834543",
"ess-dive-d3dc26585e68115-20200515T142150185",
"ess-dive-d3dc26585e68115-20201020T225909470",
"ess-dive-0c5477dba4d2f05-20201020T225350489",
"ess-dive-d3dc26585e68115-20200515T150109138",
"ess-dive-95028b2fa8c83c1-20201020T225909455",
"ess-dive-d3dc26585e68115-20201204T141934245",
"ess-dive-d3dc26585e68115-20201204T141627855",
"ess-dive-d3dc26585e68115-20201021T143135536",
"ess-dive-0b7c6c149598a6e-20201021T143135516",
"ess-dive-d3dc26585e68115-20201204T142120400",
"ess-dive-d3dc26585e68115-20230126T203333734",
"ess-dive-d3dc26585e68115-20230126T192951082",
"ess-dive-56568998d93e836-20210721T144541670",
"ess-dive-d3dc26585e68115-20230111T185638747",
"ess-dive-d3dc26585e68115-20221202T230234086",
"ess-dive-378f9d5332a4096-20230111T185638717",
"ess-dive-d3dc26585e68115-20230131T175343754",
"ess-dive-6c35f2750204ce4-20230126T191457498",
"ess-dive-a4a027699c6069c-20230131T175343746",
"ess-dive-5a8948b9292963e-20230126T203333727",
"ess-dive-d3dc26585e68115-20230201T193913446",
"ess-dive-d3dc26585e68115-20230223T195155499",
"ess-dive-d3dc26585e68115-20230224T184247163",
"ess-dive-d3dc26585e68115-20230713T163740417",
"ess-dive-030d8df7264a897-20230223T194749497",
"ess-dive-d3dc26585e68115-20230509T161547830",
"ess-dive-1943b70bfb1317c-20230713T163740404",
"ess-dive-93a5078b48b9861-20210430T034309740865",
"ess-dive-f734bda4c77eeec-20210430T034316371074",
"ess-dive-d3dc26585e68115-20230303T191024491",
"ess-dive-ca49b4e2151b089-20230303T191024484",
"ess-dive-36864efb408ed31-20230406T133034211824",
"ess-dive-db47e5852816501-20230406T125504171342",
"ess-dive-d230eda8aa8cabb-20230406T161810415530",
"ess-dive-5cf7f6b1c328b52-20230406T160715666088",
"ess-dive-448d1fbd7255ad8-20230406T145159892975",
"ess-dive-d3dc26585e68115-20230408T011330214",
"ess-dive-11a4f2ba9fd070b-20230408T011330210",
"ess-dive-d3dc26585e68115-20230915T183958972",
"ess-dive-caac3f44c784091-20230828T211044954",
"ess-dive-3cfa1c3e7e91d4b-20230828T211202773",
"ess-dive-d3dc26585e68115-20230905T214847645",
"ess-dive-b976fc90b3be47e-20230915T183958956",
"ess-dive-1694c1878f899cd-20200306T184255760",
"ess-dive-3cc110241364ea5-20201204T142120391",
"ess-dive-12f3fcd2b5d3e9f-20201204T141627838",
"ess-dive-e0f7883f61a8969-20201204T141934232",
"ess-dive-b98da572b57cea2-20200515T150114737",
"resource_map_urn:uuid:bc0baff4-3ffa-40a7-b052-c909b768b95b",
"ess-dive-a185a1cace16428-20200306T033920705",
"ess-dive-4239276ad44da90-20200515T142156064",
"ess-dive-2a62faca694210d-20230126T192951073",
"ess-dive-3daf36d9eeea406-20221202T230234073",
"ess-dive-09cae636939608c-20230201T193913439",
"ess-dive-29afef61882ed95-20230223T195155491",
"ess-dive-f3908422c0e934f-20230224T184247155",
"ess-dive-a7e16d7b34e5ea4-20230406T161818382294",
"ess-dive-90a84a6ce9e7b5f-20230406T160722176395",
"ess-dive-adc0e02bafbbe4e-20230406T133041082066",
"ess-dive-3284b554e2eb148-20230509T161547824",
"ess-dive-c56290330e4ae13-20230905T214847632",
"ess-dive-c2f378f6ac18d2e-20230406T125511678691",
"ess-dive-1a5976ecbb74063-20230406T145207934628"
],
Initial Observations:
- Difference in behavior:
- On the dataset landing page we're using the ID
doi:10.15485/1603775as PID (on both ESS DIVE and DATAONE); but for the data catalog page we're using PIDess-dive-d3dc26585e68115-20230929t171354410for ESS DIVE catalog andess-dive-d3dc26585e68115-20210722T195610978for DataONE catalog. - It looks like the latest version ess-dive-d3dc26585e68115-20230929t171354410 has not been synced with the CN ess-dive-d3dc26585e68115-20210722T195610978, so Metrics service is not able to find identifiers using the PID chain. But it is able to find events related to the PID
*-4410in the ES index, so it is reporting only those events (i.e.713 views)
- On the dataset landing page we're using the ID
Edit:
ESS_DIVE catalog metrics_query
DataONE catalog metrics_query
ESS_DIVE and DataONE dataset landing page metrics_query
Thanks @helbashandy for the analysis. That definitely sheds a light on what may be going on. @rushirajnenuji @mbjones is there a reason that the search result page does not use the DOI for the metrics query pid. Seems like this would be the optimal solution so that it doesn't look like the metrics were reduced when wait for the CN to sync the latest version.
I think all pages use the most recent version of the PID that is available to them. If the PIDs haven't synced to the CN, then there won't be metrics yet for that PID. @rushirajnenuji should be able to clarify what needs to be fixed here. It may be only that we need sync and indexing to be more immediate.
In case it's helpful:
I revisited the use case Hesham originally presented (Toyoda et al. 2020; https://data.ess-dive.lbl.gov/view/doi:10.15485/1603775) and compared it to a dataset that is: from the same project and team of data managers, that was originally published around the same time, and has been versioned multiple times after publication.
For Toyoda et al. 2020, the preview on the search page now matches the counts reported on the dataset landing page. The dataset hasn't been versioned since November 2023 when the inconsistency was reported, but the PID has changed only once on 2024-01-11 T18:51:46. Current PID: ess-dive-6269f157183d70c-20240111T185412906.
For Goldman et al. 2020 (https://data.ess-dive.lbl.gov/view/doi:10.15485/1729719) the preview on the search page does not match the dataset landing page. The dataset also hasn't been officially versioned since November 2023 and was updated only once since November 2023 on 2024-01-11 T18:54:13. Current PID: ess-dive-d3dc26585e68115-20240111T185146496