medley icon indicating copy to clipboard operation
medley copied to clipboard

Error when processing bibliography

Open masinter opened this issue 1 year ago • 7 comments

image

(https://github.com/Interlisp/Interlisp.github.io/actions/runs/11044838534)

masinter avatar Sep 26 '24 15:09 masinter

I checked out the main branch and ran scripts/update_bibliography.sh locally (bash shell, cd to the scripts directory). It completed without error. What has changed? The version of jq? The scripts/update_bibliography.sh file itself hasn't changed since June 9th. (I'm running jq-1.6.) (I have a change to that file in the branch mth2-use-DOI-for-url-bibliography but there's not a PR for that yet.)

MattHeffron avatar Sep 27 '24 02:09 MattHeffron

The issue was with Zotero, the retrieval of bibliographic information returned garbage and we failed.  Rerunning the action completes successfully.

The error was not isolated to the Interlisp github site, my clone also failed last night.  I had left it running to make sure my modifications to the actions, in PR 239 - didn't cause problems.

stumbo avatar Sep 27 '24 02:09 stumbo

The issue was with Zotero, the retrieval of bibliographic information returned garbage and we failed.  Rerunning the action completes successfully.

The error was not isolated to the Interlisp github site, my clone also failed last night.  I had left it running to make sure my modifications to the actions, in PR 239 - didn't cause problems.

So, does basic sanity checking need to be added to scripts/update_bibliography.sh? Clearly, it would fail either with/without checking, to the end result is the same.

MattHeffron avatar Sep 27 '24 03:09 MattHeffron

I wish we had some kind of version control for the bibliography. Right now it seems like one person could sync while another is updating. Is there any way of moving the 'truth' of the bibliography to git / github?

masinter avatar Sep 28 '24 20:09 masinter

It appears that Zotero does not have a way to "lock" a collection, or hierarchy, while the API is being queried. So, keeping a copy of the bibliography.json (or the initial raw responses from Zotero) doesn't solve the potential issue. Getting the Zotero data is all done at the beginning of the update_bibliography.sh script. After that, it is massaged locally. Zotero does throttle API queries, so it can take awhile, sometimes, to get the initial data. There is no way to transaction the queries in order to ensure that the Zotero information is all in a consistent state.

MattHeffron avatar Oct 01 '24 23:10 MattHeffron

Following up from yesterday's conversation. I spent some time digging deeper into the 2024-09-26 crash. I'm don't have a clear resolution, nor do I have a clear understanding of exactly where the problem occurred. Here's what I found:

The crash occurred at 2024-09-26 03:09:00 GMT. In the GithHub Action log this was run 762.

We were running the update_bibliography.sh and if failed in the following function:

function add_items_from_collection () {
    local collection_key="$1"
    echo "Getting collection $collection_key"
    local start=0
    local limit=100
    while :; do
        local this_page=$(curl -s "https://api.zotero.org/groups/$GROUP_ID/collections/$collection_key/items?include=data,csljson&start=$start&limit=$limit&v=3")
        items=$(jq -s 'add' <<< "$this_page$items")
        start=$(($start + $limit))

        # Break when we don't get any more items
        [[ $(jq '. | length' <<< "$this_page") > 0 ]] || break
    done

    # Recurse into subcollections
    while read subcollection_key; do
        add_items_from_collection $subcollection_key
    done < <(curl -s "https://api.zotero.org/groups/$GROUP_ID/collections/$collection_key/collections" | jq -r '.[].key')
}

The collection being retrieved was MASL2F46. The error message we received, parse error: Invalid numeric literal at line 1, column 3 is likely being raised by jq. Most likely the json returned from the REST Get request was not well-formed.

When I reran the script at 2024-09-26 15:28:52 it ran correctly.

There were no changes to our Zotero catalog between the failed and successful run. Zotero keeps a version number that is updated when any part of a collection changes. Both the failed and successful version were using version 7836.

The catalog had changed since the previous run and consequently we did not have the correct information cached, this necessitated the rebuilding of the json data file.

The collection that was being processed when the failure occurred had some edits on 2024-09-25 17:06:51 GMT. The pdf file 1988 - Rooms was added. The editing had been completed well before the failure and didn't change between the failed and successful run.

Nothing I see convinces me that the error was Zotero. The returned json in both the successful and failed cases should have been identical, nothing says they would have been different. The error message points to a potential jq issue. Our script did not change, so there were no changes in how we were using jq.

GitHub made no changes to the Ubuntu image we were using between the failed and successful runs, Ubuntu version 20240912.1.0. Nor did anything else change with GitHub runners.

GitHub reported some errors with a software update on the day of 2024-09-25 that caused some GitHub actions to fail. That was spotted and resolved prior to our job running.

Zotero could have had a problem and their site went down. I was unable to find a uptime history for Zotero that extended back to the 26th.

Another wild theory is the edits done to the catalog fall into a time frame that GitHub actions were experiencing errors, does any work updating information in the Zotero ecosystem rely on GitHub actions, and if so could that have caused a glitch that a later run within Zotero cleaned up? This seems a little too much like grasping at straws.

Next steps? We could reach out to Zotero and see if they can provide any more insight or ideas on how to determine whether there was an issue with the returned json.

stumbo avatar Oct 03 '24 11:10 stumbo

Most likely the json returned from the REST Get request was not well-formed.

I wonder if Zotero's throttling of responding to API requests caused curl to timeout, leading to incomplete info piped to jq. Zotero API Rate Limiting When I have been testing my edits to the update_bibliography.sh, I have had times where the curl has timed out. As far as I can tell, it appears that curl does not respect the Backoff: or Retry-After: response headers, nor the response code 429 Too Many Requests/ Here is a GitHub gist that purports to enable curl to get status code and response body](https://gist.github.com/maxcnunes/9f77afdc32df354883df) and comments with alternatives using jq.

MattHeffron avatar Oct 15 '24 05:10 MattHeffron