finreportr icon indicating copy to clipboard operation
finreportr copied to clipboard

HTTP error 403

Open rrik opened this issue 3 years ago • 15 comments

Hello,

I am getting a 403 error when attempting the following

`> GetIncome("FB", 2016) Error in fileFromCache(file) : Error in download.file(file, cached.file, quiet = !verbose) : cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xsd'

In addition: Warning message: In download.file(file, cached.file, quiet = !verbose) : cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xsd': HTTP status was '403 Forbidden'`

Do the source links need updating? Thank you!

rrik avatar Jun 16 '21 23:06 rrik

Hello, I'm having a similar issue, but with "404 Not Found":

GetIncome("TSLA", 2020)
Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-20191231.xml'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-20191231.xml': HTTP status was '404 Not Found'

darh78 avatar Jun 20 '21 17:06 darh78

@darh78 That file doesn't exist try: https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231_htm.xml

@rrik that happens to me also with older submissions, seems like it has to do with the SEC fair use policy, you can try downloading the file manually and put it in the cache folder, or you can run the code few times, it will eventually end up downloading it.

selgamal avatar Jun 20 '21 21:06 selgamal

Hi,

I also tried same error,

   if (foreign == FALSE) {
        url <- paste0("http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=", 
            symbol, "&type=10-k&dateb=&owner=exclude&count=100")
    }
    filings <- xml2::read_html(url)

I try to change count for 1 and works, so it seems this page is detecting that we are not a browser and block. We need to use rSelenium :(

PatronMaster avatar Jul 25 '21 15:07 PatronMaster

I have been receiving the same error. Is there any workaround?

uramnama avatar Jul 30 '21 18:07 uramnama

same error here:

CompanyInfo("GOOG") Error in open.connection(x, "rb") : HTTP error 403.

smartgamer avatar Aug 29 '21 19:08 smartgamer

Same error 403 in all functions

AnnualReports ("TSLA") Error in open.connection(x, "rb") : HTTP error 403.

ramirezjaime avatar Sep 09 '21 17:09 ramirezjaime

R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] edgarWebR_1.1.0 finreportr_1.0.2

loaded via a namespace (and not attached): [1] xml2_1.3.2 magrittr_2.0.1 tidyselect_1.1.1 rvest_1.0.1 R6_2.5.1 rlang_0.4.11
[7] fansi_0.5.0 stringr_1.4.0 httr_1.4.2 dplyr_1.0.7 tools_4.1.0 utf8_1.2.2
[13] DBI_1.1.1 selectr_0.4-2 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.4 lifecycle_1.0.0 [19] crayon_1.4.1 purrr_0.3.4 vctrs_0.3.8 curl_4.3.2 glue_1.4.2 stringi_1.7.4
[25] compiler_4.1.0 pillar_1.6.2 generics_0.1.0 pkgconfig_2.0.3

ramirezjaime avatar Sep 09 '21 17:09 ramirezjaime

I am also experiencing this problem.

j-uchiha avatar Nov 07 '21 01:11 j-uchiha

Here is my workaround to your problem.

The problem is that the SEC wants the scraper to be identified in what it is called user-agent.

Before placing my request for data I execute ...

     options(HTTPUserAgent = "your name here   [email protected]")

The user name is only remembered during the current session.

With this workaround, everything works fine for me, no more errors 403

VS

vsoler avatar Dec 01 '21 23:12 vsoler

I used vsoler's suggestion to use the options statement and I'm still having trouble:

GetIncome("MA", 2020)
Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1141391/000114139120000032/ma-20191231.xml'

In addition: Warning messages:
1: In download.file(file, cached.file, quiet = !verbose) :
  downloaded length 0 != reported length 324
2: In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1141391/000114139120000032/ma-20191231.xml': HTTP status was '404 Not Found'

According to the SEC the user-agent must be used in the request header.

eweiss99 avatar Jan 21 '22 16:01 eweiss99

Hi guys,

Any chances of having an update solving the pb here? I am still running into errors despite using the user agent, but only for specific years.

Padiol avatar Mar 27 '22 02:03 Padiol

My work around for this problem was to install two missing packages 'XBRL' and 'Rcpp'

billytaipei101 avatar Apr 09 '22 16:04 billytaipei101

Guys could you please suggest current solution for this problem? (HTTP error 403) Secondly is this package actively maintained or not? Thanks in advance!

Alex-Sigma avatar Aug 13 '22 17:08 Alex-Sigma

There are several errors being conflated in this issue.

The 403 errors are because your clement is not authorised. This is because you have not set (or have improperly set) your User-Agent header and the SEC is saying you can’t have access.

The 404 error mentioned by @eweiss99 is because the file that finreportr is trying to download does not exist. The finreportr package guesses the correct file name of the submission file by adding the date to the ticker code (ma-20191231.xml). But, for whatever reason, the filer didn’t name their submission file like that. If you got to the actual accession web page, you see that the file is actually called ma12312019-10xk_htm.xml. This is a legit bug in finreportr because it is not correctly determining the file name.

IMO the best fix here would be for finreportr to actually download the header file for the accession number, extract the table with the file descriptions, and select the correct file name on the basis of the description.

I’ve got a bit of momentum here so I’ll try see if it’s a simple fix and make a pull request.

riazarbi avatar Nov 16 '22 12:11 riazarbi

@vsoler's answer on

options(HTTPUserAgent = "your name here   [email protected]")

worked like a charm. Hope this can be seen on the main readme page!

matthewgson avatar Sep 07 '23 14:09 matthewgson