simple-salesforce icon indicating copy to clipboard operation
simple-salesforce copied to clipboard

query_all_iter() throws url parse exception after 2000 rows retrieved

Open tggagne opened this issue 4 years ago • 5 comments

It doesn't seem to matter what I use for a domain. It doesn't matter if I login to "login.salesforce.com" or "domain.my.salesforce.com."

Traceback (most recent call last):
  File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 380, in prepare_url
    scheme, auth, host, port, path, query, fragment = parse_url(url)
  File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\urllib3\util\url.py", line 392, in parse_url
    return six.raise_from(LocationParseError(source_url), None)
  File "<string>", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: https://domain.my.salesforce.comOrderedDict([('encrypted', False), ('ffxBlobInfo', OrderedDict([('bundleOffset', 0), ('crypId', None), ('entityId', '00P1M00001B31qz'), ('hash', None), ('sqltypeName', 'FFX_BLOB_INFO'), ('store', 'KEY'), ('version', '0KF1M00002KJeQL')])), ('inputStream', OrderedDict()), ('length', 1227146), ('realBlobValue', OrderedDict([('file', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf'), ('inputStream', OrderedDict()), ('length', 1227146), ('protectedFile', OrderedDict([('item', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf')]))]))])

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "attachments_downloader.py", line 128, in <module>
    download_attachments(vars(args))
  File "attachments_downloader.py", line 66, in download_attachments
    resp = session.get(remote_path, headers=req_headers)
  File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 516, in request
    prep = self.prepare_request(req)
  File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 459, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 314, in prepare
    self.prepare_url(url, params)
  File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 382, in prepare_url
    raise InvalidURL(*e.args)
requests.exceptions.InvalidURL: Failed to parse: https://domain.my.salesforce.comOrderedDict([('encrypted', False), ('ffxBlobInfo', OrderedDict([('bundleOffset', 0), ('crypId', None), ('entityId', '00P1M00001B31qz'), ('hash', None), ('sqltypeName', 'FFX_BLOB_INFO'), ('store', 'KEY'), ('version', '0KF1M00002KJeQL')])), ('inputStream', OrderedDict()), ('length', 1227146), ('realBlobValue', OrderedDict([('file', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf'), ('inputStream', OrderedDict()), ('length', 1227146), ('protectedFile', OrderedDict([('item', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf')]))]))])

As a consequence, I'm using query_all(), but it seems to be querying each row individually--and is taking a LONG time to query all 26000+ rows.

I'm using this inside of salesforce_scripts' attachments_download.py.

tggagne avatar Jul 02 '20 02:07 tggagne

This stackoverflow article suggested the problem was in the urllib3 module or a broken 2.23 requests module. I've tried both upgrading urllib3 and downgrading requests without changing the result.

(I've since returned requests to its later 2.24 version -- see module list at the bottom)

What I find interesting in the traceback is the formatting of the url it says it can't parse.

Failed to parse: https://domain.my.salesforce.comOrderedDict([('encrypted...

Why does OrderedDict follow salesforce.com without any delimiter or whitespace?

droz2:tmp tgagne$ pip3 list
Package           Version
----------------- ---------
Authlib           0.14.3
certifi           2020.6.20
cffi              1.14.0
chardet           3.0.4
cryptography      2.9.2
idna              2.10
pip               20.1.1
pycparser         2.20
requests          2.24.0
setuptools        41.2.0
simple-salesforce 1.1.0
six               1.15.0
urllib3           1.25.9

tggagne avatar Jul 02 '20 12:07 tggagne

under the debugger, I discovered the body attribute of the record (it's an attachment) no longer has the rest URI for the body like, services/data/v42.0/sobjects/Attachment/00P1M00001ALzemUAD/Body but instead contains: OrderedDict([('encrypted', False), ('ffxBlobInfo', OrderedDict([('bundleOffset', 0), ('crypId', None), ('entityId', '00P1M00001B31qz'), ('hash', None), ('sqltypeName', 'FFX_BLOB_INFO'), ('store', 'KEY'), ('version', '0KF1M00002KJeQL')])), ('inputStream', OrderedDict()), ('length', 1227146), ('realBlobValue', OrderedDict([('file', '/home/sfdc/tmp/bvf/blob6690371194100654857.bvf'), ('inputStream', OrderedDict()), ('length', 1227146), ('protectedFile', OrderedDict([('item', '/home/sfdc/tmp/bvf/blob6690371194100654857.bvf')]))]))])

Curiously, this only happens AFTER the 2000th row--which means the records are now coming from the new query_all_iter() code.

@kokes , perhaps I'm misusing query_all_iter()?

tggagne avatar Jul 02 '20 17:07 tggagne

@tggagne are you intending to get all records for an object, even if they have been deleted or merged (Salesforce QueryAll documentation)? I ask because query_all is a bit misleading so wanted to confirm that is your intended use case!

Also, if you are interested in bulk record extraction, then you might look into the simple-salesforce Bulk API.

wyattshapiro avatar Sep 14 '20 15:09 wyattshapiro

Wow totally missed your response. Sorry about that.

I don't think I needed to get the deleted rows--but could be wrong. That message was from July 2 2020, so I had to find a work-around.

Regardless of whether I needed deleted rows or not, it's broken and can't get past 2000 rows.

Is it now possible to get past 2000 rows? Was a bug fixed?

tggagne avatar Feb 15 '21 23:02 tggagne