londonmapbot
londonmapbot copied to clipboard
The action sometimes fails
Examples of failure (red crosses) on the Actions tab: https://github.com/matt-dray/londonmapbot/actions
Image gets downloaded, but then:
Error in curl::curl_fetch_memory(url, handle = handle) :
Empty reply from server
Calls: <Anonymous> ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Execution halted
Error: Process completed with exit code 1.
I find GitHub actions can be a bit hit and miss. Sometimes it just seems to fall over for no clear reason.
Haha, last night it failed four times in a row for this repo! We talked about a possible work application for cron-based actions, but I'm not sure it's a great idea if you can't handle a fail in the workflow.
Yeah. We probably should investigate the common cause of failures across our cron based actions.
Probably not related if the error above is the only error you're getting, but some of my action fails have been that R hasn't installed properly. However, I've just discovered that R comes pre-installed on the macOS GitHub runner as per this spec. So lines 17-18 of the yaml aren't necessary any more. Which should at least speed up the action a little.
But on your curl issue … suggests its a problem with network connectivity "Empty reply from server" suggests something is going wrong in Twitter server land (or the way {rtweet}/its dependencies are making the request). The main thing is whether it's happening in the OAuth stage or the tweet posting stage - but looking at the workflow outturn you link to above it looks like it's got past the mapbox download and so it's in the tweet stage of the script that it's getting some sort of problem.
As you're running the script via Rscript there's not much useful logging - you might want to either add some messages to the script and/or specify Rscript --verbose {file} to make it "chatty".
I found two posts that might be related to the current issue.
In short, if we use download.file() on https://, there might be some problem. So instead of figuring out what causes it, I just switch to httr::GET() instead.
httr::GET(img_url, httr::write_disk(temp_file, overwrite = TRUE))
Currently testing this build.
Update: No, it doesn't work.
However, I've just discovered that R comes pre-installed on the macOS GitHub runner as per this spec. So lines 17-18 of the yaml aren't necessary any more. Which should at least speed up the action a little.
I've just tested this and can confirm R 4.0.2 is available on the macOS-latest runner, log here. However, one caution, you'll need to use the repos argument in install.packages() (or set it via options).
Hello. @matt-dray thanks for writing your blog post - I found it very helpful. I am experiencing this same issue with an rtweet bot, and I agree the issue is with rtweet or twitter.
In my case post_tweet('some text', media = 'some_pic.png') sometimes works fine, but often runs for minutes and then fails with the same error:
curl::curl_fetch_memory(url, handle = handle) :
Empty reply from server
However, without the media post_tweet('some_text') works instantly with no error.
Could Twitter be blocking the posts with media, because they view it as suspicious / automated activity? At one point my bot got blocked by twitter, which gave an explicit error to that effect. The account seems to be unblocked, but I wonder if Twitter still has some restrictions in place?
Thanks @mattkerlogue, @rexarski and @scott-saunders.
Given the near-identical nature of londonmapbot's tweet contents, I figured that any blocking from Twitter's end would be all-or-nothing, but it looks like the action failures are more-or-less random. I also thought londonmapbot might be more susceptible to blocking when I started posting URLs, but haven't had any issues.
I'm not sure of Twitter's algorithm for detecting 'malicious bots', but I guess slight variation in the time taken for the action to run and post (and the fact it fails randomly!) might help to prevent it being flagged. This is worth a read in any case (might be slightly out of date): https://help.twitter.com/en/rules-and-policies/twitter-automation
Having looked back through the Actions logs it seems that the narrowbotR’s workflow, which uses its custom post_geo_tweet() function, has only had one curl error so far, and as with others it got an Empty reply from server response.
I’ve checked the Twitter developer portal and I can’t see that I’ve done anything different in the app setup on their side to make this less suspicious to Twitter’s spam filters (if that is what’s happening).
Given the documentation you linked to @matt-dray these types of bots definitely seem to be in scope.
Provided you comply with all other rules, you may post automated Tweets for entertainment, informational, or novelty purposes.
The mapbots only post two tweets an hour, so definitely we’ll shy of the rate limits set out in Twitter’s documentation on the POST statuses/updates. I’m not sure @scott-saunders if you’ve put anything in your covid bot (awesome work by the way!) that keeps a check on limits.
I still think it could be an issue with the GitHub Actions runners, as I’m not convinced the have perfect connectivity with the rest of the internet. The R install from r-libs and package installs would periodically fall over on my Google scraping repo without reason. The narrowbotR some how failed to install {data.table} recently (wasn’t aware it was a package dependency), which was a surprise since the code calls for it to install packages from the cloud mirror of CRAN.
Thanks @mattkerlogue ! I haven't done anything to check the limit on tweets per hour, mostly because covid_data_bot isn't seeing that level of traffic, but that's a good point for the future. For now I think I'm also well under the 300 / 3hr limit.
Yeah the github actions connectivity maybe an issue (startup time and package install is sometimes wildly variable for me too), but in my code the bot first uses rtweet to search for tweets and reads in nyt data from github, so it's not so bad that it's causing an error there. I have also experienced the same post_tweet() problem using rtweet locally on my computer, so I don't think connectivity is the main issue.
Here's one thing from the POST documentation:
For each update attempt, the update text is compared with the authenticating user's recent Tweets. Any attempt that would result in duplication will be blocked, resulting in a 403 error. A user cannot submit the same status twice in a row.
covid data bot was having issues posting multiple tweets, because of the weird overlapping timing of github actions. Perhaps twitter could have been blocking the post of duplicate tweets? I have mostly fixed this duplicate posting issue since then, so we'll see if it continues. Could rare duplicate posts explain any of your issues? @mattkerlogue @matt-dray
Let's say twitter api does throw the 403 error, does anyone know what error rtweet / github actions would show?
Interesting suggestion @scott-saunders, I've never had issues with post_tweet() from the console.
Nothing in my code (or I think @matt-dray's) attempts to post multiple tweets in quick succession. However, I don't know if the underlying code of {rtweet} tries to post multiple times if it gets a fail.