finance-dl
finance-dl copied to clipboard
amazon module fails to download digital invoices
finance-dl formats the url as
https://www.amazon.com/gp/css/summary/print.html?ie=UTF8&orderID=D01-1380792-3469006
which results in an error.
This one, which matches the pattern when I manually visit digital order invoices, works.
https://www.amazon.com/gp/digital/your-account/order-summary.html/ref=ppx_yo_dt_b_dpi_o00?ie=UTF8&orderID=D01-1380792-3469006&print=1
Dunno if the url has been changed since the code was written or if I'm hitting a unique issue.
Can you paste the trace? What's the exact line where the exception is thrown?
2022-01-16 20:44:57,318 amazon.py:255 [INFO] Downloading invoice for order 'D01-1380792-3469006'
...
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions
It times out because the browser is sitting on a "We're sorry!" page
.
It looks like I might have introduced this issue in #46, where I did not consider the case of digital orders. Can you check out the commit before my PR (70e93851378915036e054846e4cf9181f59e900d) and see if the issue is present for you?
It downloads digital orders just fine when I switch to that commit.
Can you confirm if your terminal shows the line "Found likely Amazon Fresh order. Falling back to direct invoice URL." before the script crashes?
Doesn't look like it. This is the entirety of the spew
2022-01-19 10:50:00,350 amazon.py:223 [INFO] Skipping order group: '1998'
2022-01-19 10:50:00,398 amazon.py:223 [INFO] Skipping order group: '1997'
2022-01-19 10:50:00,444 amazon.py:223 [INFO] Skipping order group: '1996'
2022-01-19 10:50:00,494 amazon.py:223 [INFO] Skipping order group: '1995'
2022-01-19 10:50:00,525 amazon.py:255 [INFO] Downloading invoice for order 'D01-5823337-6656218'
Traceback (most recent call last):
File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 408, in retry
return func()
File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 428, in fetch
scraper.run()
File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 282, in run
self.get_orders(regular=self.regular, digital=self.digital)
File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 249, in get_orders
self.retrieve_invoices(invoice_hrefs)
File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 267, in retrieve_invoices
page_source, = self.wait_and_return(get_source)
File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 244, in wait_and_return
WebDriverWait(self.driver, timeout).until(predicate, message=message)
File "/home/ido/.local/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions
Waiting 0 seconds before retrying
--connect=http://127.0.0.1:59833 --session-id=1f390d5a8651a03a26497cdc7e766b17
2022-01-19 10:50:33,871 amazon.py:109 [INFO] Initiating log in
2022-01-19 10:50:36,273 amazon.py:116 [INFO] You must be already logged in!
2022-01-19 10:50:41,160 amazon.py:223 [INFO] Skipping order group: 'last 30 days'
2022-01-19 10:50:41,224 amazon.py:223 [INFO] Skipping order group: 'past 3 months'
2022-01-19 10:50:41,284 amazon.py:223 [INFO] Skipping order group: '2022'
Something doesn't add up. If the Fresh log message is not present, then the code is not creating the URL but is extracting it from the page, which should result in a correct URL.
I suggest you sprinkle some logger.info
in various places to see what's happening.
dunno if related, but HEAD scrapes the wrong group. If I set it to 2022, i can watch it scrape 2021.
#70e9385 scrapes the correct group.
That's strange, because my PR did not touch any of the group logic.
We had an off by one issue with the drop-down menu in the Amazon downloader a while ago, but I think that got fixed. I can't reproduce either of your two issues with master. Digital orders download just fine for me.
Odd. I wiped the 2022 dir and repeated it. I get both 2022 and 2021 invoices in there. This is my cfg
def CONFIG_amazon_2022():
return dict(
order_groups=[
"2022",
],
module='finance_dl.amazon',
digital=True,
credentials={
'username': XXXX
'password': XXXX
},
output_directory=os.path.join(data_dir, 'amazon', "amazon_2022"),
profile_dir=os.path.join(profile_dir, 'amazon'),
)
Anyways, using that older commit I was able to download 3048 invoices starting back from the year 2000, 1133 of which were digital invoices that HEAD would not fetch. Of all of those, amazon legitimately can't produce 4 of them so I had to stub the files so the script would pass over them.
Feel free to close this out. I have a version of the code that works for me and unless anyone else is having issues, I would not prioritize an issue you can't reproduce.
I'm hitting both of these issues; I only get orders from 2020 when I specify order_groups=['2021']
, and it fails while trying to download digital orders, with the same error page @dppdppd described.
I installed finance-dl using pip install finance-dl
, which got me finance-dl-1.3.3.
Comparing the the v1.3.3 tag to master, there are several commit messages that mention various fixes for Amazon. Maybe it would be worth it to cut a new release and push it up to PyPI?
Adding some more information to this, the order groups don't download correctly for me i.e. 2022 downloads 2021 invoices, and so on, and not inputting an order group results in a timeout error. Interestingly, setting the order group "past 3 months" downloads 2022 invoices, so it seems like everything is being shifted 'down' the order group hierarchy by one level.