SerpScrap icon indicating copy to clipboard operation
SerpScrap copied to clipboard

Proxy table unique constraint errors

Open ihopethiswillfi opened this issue 7 years ago • 2 comments

I'm getting lots of the below errors.

Using the latest commit from the 0.9.1 branch (which is already working much better than 0.9.0). On Linux Mint 18.2.

2017-09-12 07:46:44,173 - root - INFO - preparing phantomjs
2017-09-12 07:46:44,175 - root - INFO - detecting phantomjs
2017-09-12 07:46:44,178 - root - INFO - using phantomjs/phantomjs-2.1.1-linux-x86_64/bin/phantomjs
2017-09-12 07:46:45,185 - root - INFO - 0 cache files found in /tmp/.serpscrap/
2017-09-12 07:46:45,186 - root - INFO - 0/785 objects have been read from the cache.
        785 remain to get scraped.
2017-09-12 07:46:45,201 - root - INFO - 
                Going to scrape 785 keywords with 99
                proxies by using 1 threads.

So I'm using rather a lot of proxies and I get a lot of errors where the DB is updating an existing proxy row and it seems like it's trying to update the IP to one that already exists in the table! This does not respect the Unique constraint on the IP column.

I've checked and I have NO duplicate IP's and ports in the proxy file.

So in the below example it's trying to update row 101 (which currently has IP address 196.196.232.5) and it's trying to change this row into IP 162.253.131.178, but that one already exists in the table.

sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.IntegrityError) UNIQUE constraint failed: proxy.ip, proxy.port [SQL: 'UPDATE proxy SET ip=?, online=?, status=?, checked_at=?, city=?, region=?, country=?, loc=?, org=?, postal=? WHERE proxy.id = ?'] [parameters: ('162.253.131.178', 1, 'Proxy is working.', '2017-09-12 08:45:03.818061', 'Toronto', 'Ontario', 'CA', '43.6230,-79.3936', 'AS32489 Amanah Tech Inc.', 'm5j 2n1', 101)]

ihopethiswillfi avatar Sep 12 '17 08:09 ihopethiswillfi

This seems to not happen inside a single run. I.e. only after running for the second time.

Temporary fix I'm using: after every run I manually delete all rows from the Proxy table in the DB.

edit: this doesn't always seem to work.

ihopethiswillfi avatar Sep 12 '17 09:09 ihopethiswillfi

Hi, thx for reporting, i will check it as soon as possible.

ecoron avatar Sep 12 '17 12:09 ecoron