finance-dl icon indicating copy to clipboard operation
finance-dl copied to clipboard

amazon module fails to download digital invoices

Open dppdppd opened this issue 3 years ago • 13 comments

finance-dl formats the url as https://www.amazon.com/gp/css/summary/print.html?ie=UTF8&orderID=D01-1380792-3469006

which results in an error.

This one, which matches the pattern when I manually visit digital order invoices, works. https://www.amazon.com/gp/digital/your-account/order-summary.html/ref=ppx_yo_dt_b_dpi_o00?ie=UTF8&orderID=D01-1380792-3469006&print=1

Dunno if the url has been changed since the code was written or if I'm hitting a unique issue.

dppdppd avatar Jan 16 '22 21:01 dppdppd

Can you paste the trace? What's the exact line where the exception is thrown?

Zburatorul avatar Jan 17 '22 21:01 Zburatorul

2022-01-16 20:44:57,318 amazon.py:255 [INFO] Downloading invoice for order 'D01-1380792-3469006'
...
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions

It times out because the browser is sitting on a "We're sorry!" page

Screenshot 2022-01-17 134015 .

dppdppd avatar Jan 17 '22 21:01 dppdppd

It looks like I might have introduced this issue in #46, where I did not consider the case of digital orders. Can you check out the commit before my PR (70e93851378915036e054846e4cf9181f59e900d) and see if the issue is present for you?

Zburatorul avatar Jan 18 '22 01:01 Zburatorul

It downloads digital orders just fine when I switch to that commit.

dppdppd avatar Jan 18 '22 02:01 dppdppd

Can you confirm if your terminal shows the line "Found likely Amazon Fresh order. Falling back to direct invoice URL." before the script crashes?

Zburatorul avatar Jan 19 '22 03:01 Zburatorul

Doesn't look like it. This is the entirety of the spew

2022-01-19 10:50:00,350 amazon.py:223 [INFO] Skipping order group: '1998'
2022-01-19 10:50:00,398 amazon.py:223 [INFO] Skipping order group: '1997'
2022-01-19 10:50:00,444 amazon.py:223 [INFO] Skipping order group: '1996'
2022-01-19 10:50:00,494 amazon.py:223 [INFO] Skipping order group: '1995'
2022-01-19 10:50:00,525 amazon.py:255 [INFO] Downloading invoice for order 'D01-5823337-6656218'
Traceback (most recent call last):
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 408, in retry
    return func()
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 428, in fetch
    scraper.run()
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 282, in run
    self.get_orders(regular=self.regular, digital=self.digital)
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 249, in get_orders
    self.retrieve_invoices(invoice_hrefs)
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/amazon.py", line 267, in retrieve_invoices
    page_source, = self.wait_and_return(get_source)
  File "/home/ido/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 244, in wait_and_return
    WebDriverWait(self.driver, timeout).until(predicate, message=message)
  File "/home/ido/.local/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Waiting to match conditions

Waiting 0 seconds before retrying
 --connect=http://127.0.0.1:59833 --session-id=1f390d5a8651a03a26497cdc7e766b17
2022-01-19 10:50:33,871 amazon.py:109 [INFO] Initiating log in
2022-01-19 10:50:36,273 amazon.py:116 [INFO] You must be already logged in!
2022-01-19 10:50:41,160 amazon.py:223 [INFO] Skipping order group: 'last 30 days'
2022-01-19 10:50:41,224 amazon.py:223 [INFO] Skipping order group: 'past 3 months'
2022-01-19 10:50:41,284 amazon.py:223 [INFO] Skipping order group: '2022'

dppdppd avatar Jan 19 '22 18:01 dppdppd

Something doesn't add up. If the Fresh log message is not present, then the code is not creating the URL but is extracting it from the page, which should result in a correct URL.

I suggest you sprinkle some logger.info in various places to see what's happening.

Zburatorul avatar Jan 19 '22 21:01 Zburatorul

dunno if related, but HEAD scrapes the wrong group. If I set it to 2022, i can watch it scrape 2021.

#70e9385 scrapes the correct group.

dppdppd avatar Jan 20 '22 07:01 dppdppd

That's strange, because my PR did not touch any of the group logic.

We had an off by one issue with the drop-down menu in the Amazon downloader a while ago, but I think that got fixed. I can't reproduce either of your two issues with master. Digital orders download just fine for me.

Zburatorul avatar Jan 20 '22 19:01 Zburatorul

Odd. I wiped the 2022 dir and repeated it. I get both 2022 and 2021 invoices in there. This is my cfg

def CONFIG_amazon_2022():
    return dict(
        order_groups=[
            "2022",
        ],
        module='finance_dl.amazon',
        digital=True,
        credentials={
            'username': XXXX
            'password': XXXX
        },
        output_directory=os.path.join(data_dir, 'amazon', "amazon_2022"),
        profile_dir=os.path.join(profile_dir, 'amazon'),
    )

Anyways, using that older commit I was able to download 3048 invoices starting back from the year 2000, 1133 of which were digital invoices that HEAD would not fetch. Of all of those, amazon legitimately can't produce 4 of them so I had to stub the files so the script would pass over them.

Feel free to close this out. I have a version of the code that works for me and unless anyone else is having issues, I would not prioritize an issue you can't reproduce.

dppdppd avatar Jan 21 '22 00:01 dppdppd

I'm hitting both of these issues; I only get orders from 2020 when I specify order_groups=['2021'], and it fails while trying to download digital orders, with the same error page @dppdppd described.

I installed finance-dl using pip install finance-dl, which got me finance-dl-1.3.3.

mjjohnson avatar Feb 16 '22 07:02 mjjohnson

Comparing the the v1.3.3 tag to master, there are several commit messages that mention various fixes for Amazon. Maybe it would be worth it to cut a new release and push it up to PyPI?

mjjohnson avatar Feb 16 '22 07:02 mjjohnson

Adding some more information to this, the order groups don't download correctly for me i.e. 2022 downloads 2021 invoices, and so on, and not inputting an order group results in a timeout error. Interestingly, setting the order group "past 3 months" downloads 2022 invoices, so it seems like everything is being shifted 'down' the order group hierarchy by one level.

arnold-c avatar Sep 06 '22 21:09 arnold-c