python-edgar icon indicating copy to clipboard operation
python-edgar copied to clipboard

HTTP Error 404

Open RainmakerP opened this issue 1 year ago • 3 comments

Hello, when running the command:

edgar.download_index(path, year, user_agent, skip_all_present_except_last=False)

I'm repeatedly getting the following error:

urllib.error.HTTPError: HTTP Error 404: Not Found

It was working fine for several months but suddenly started breaking down yesterday.

Could you please check?

Thanks!

RainmakerP avatar Aug 10 '22 12:08 RainmakerP

I'm able to run the code by using this trick: https://github.com/edgarminers/python-edgar/issues/23

Set the user agent to a random string:

user_agent = "XYZ/3.0"

RainmakerP avatar Aug 10 '22 12:08 RainmakerP

I think this solves the issue when running it locally. I have had success when using it in Insomnia but tests and site fails when deployed. I'm having the same issue with my Project as well. I was able to narrow down a couple of things:

  1. The SEC API is having issues parsing Email (specifically the '@' character) in the User-Agent header. Which was working fine before.
  2. I tried using just the name, still adhering to the [SEC Guidelines] (https://www.sec.gov/os/accessing-edgar-data), and this works locally but not on a remote server.
  3. I tried using a few mock Browser User-Agents and had similar unsuccessful results. SEC could be cracking down on using automated requests to their non-api endpoints.

punitarani avatar Aug 10 '22 15:08 punitarani

I am having the same issue. Removing the @ symbol from the user agent worked for me, but this does not seem like a long term solution. I download all filings daily.

jakemdrew avatar Aug 10 '22 15:08 jakemdrew

Its been a while since I've run this, could we be creating the issue with too many calls? If so do we have internal controls to rate limit the number of times to be compliant with their disclosed 10 calls per second limit?

If you would let me know a rough idea where to start I could take a peak at the code to look for a solution.

@jakemdrew when you say you 'download all filings daily' how much space does that take up?

datatalking avatar Apr 27 '23 19:04 datatalking

I think this issue was resolved for me by reinstalling the package.

@datatalking My current 10Q folder goes back to 2019, has 76,082 full XBRL filings, and takes up 668GB of space. Having said this, if you etract GAAP data and other features for ML, my entire EdgarData folder has 841,024 files and takes up 964GB. A lot of this is XBRL taxonomy files for the GAAP data.

jakemdrew avatar Apr 27 '23 19:04 jakemdrew

@jakemdrew what trading platform or python based system are you using, that is quite a bit of data as I thought I had a lot at 600GB. Are you using any other open source trading software?

datatalking avatar Nov 30 '23 23:11 datatalking

I have my own Python for Edgar and use ib_insync for trading thru interactive brokers. The XBRL schemas for the GAAP data also take up a substantial amount of space. Sent from my iPhoneOn Nov 30, 2023, at 5:38 PM, Andrew Schell @.***> wrote: @jakemdrew what trading platform or python based system are you using, that is quite a bit of data as I thought I had a lot at 600GB. Are you using any other open source trading software?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

jakemdrew avatar Dec 01 '23 00:12 jakemdrew