simple-salesforce
simple-salesforce copied to clipboard
query_all_iter() throws url parse exception after 2000 rows retrieved
It doesn't seem to matter what I use for a domain. It doesn't matter if I login to "login.salesforce.com" or "domain.my.salesforce.com."
Traceback (most recent call last):
File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 380, in prepare_url
scheme, auth, host, port, path, query, fragment = parse_url(url)
File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\urllib3\util\url.py", line 392, in parse_url
return six.raise_from(LocationParseError(source_url), None)
File "<string>", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: https://domain.my.salesforce.comOrderedDict([('encrypted', False), ('ffxBlobInfo', OrderedDict([('bundleOffset', 0), ('crypId', None), ('entityId', '00P1M00001B31qz'), ('hash', None), ('sqltypeName', 'FFX_BLOB_INFO'), ('store', 'KEY'), ('version', '0KF1M00002KJeQL')])), ('inputStream', OrderedDict()), ('length', 1227146), ('realBlobValue', OrderedDict([('file', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf'), ('inputStream', OrderedDict()), ('length', 1227146), ('protectedFile', OrderedDict([('item', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf')]))]))])
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "attachments_downloader.py", line 128, in <module>
download_attachments(vars(args))
File "attachments_downloader.py", line 66, in download_attachments
resp = session.get(remote_path, headers=req_headers)
File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 516, in request
prep = self.prepare_request(req)
File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 459, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 314, in prepare
self.prepare_url(url, params)
File "C:\Users\tgagne\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\models.py", line 382, in prepare_url
raise InvalidURL(*e.args)
requests.exceptions.InvalidURL: Failed to parse: https://domain.my.salesforce.comOrderedDict([('encrypted', False), ('ffxBlobInfo', OrderedDict([('bundleOffset', 0), ('crypId', None), ('entityId', '00P1M00001B31qz'), ('hash', None), ('sqltypeName', 'FFX_BLOB_INFO'), ('store', 'KEY'), ('version', '0KF1M00002KJeQL')])), ('inputStream', OrderedDict()), ('length', 1227146), ('realBlobValue', OrderedDict([('file', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf'), ('inputStream', OrderedDict()), ('length', 1227146), ('protectedFile', OrderedDict([('item', '/home/sfdc/tmp/bvf/blob9475467541197592352.bvf')]))]))])
As a consequence, I'm using query_all()
, but it seems to be querying each row individually--and is taking a LONG time to query all 26000+ rows.
I'm using this inside of salesforce_scripts' attachments_download.py.
This stackoverflow article suggested the problem was in the urllib3 module or a broken 2.23 requests module. I've tried both upgrading urllib3 and downgrading requests without changing the result.
(I've since returned requests to its later 2.24 version -- see module list at the bottom)
What I find interesting in the traceback is the formatting of the url it says it can't parse.
Failed to parse: https://domain.my.salesforce.comOrderedDict([('encrypted...
Why does OrderedDict
follow salesforce.com
without any delimiter or whitespace?
droz2:tmp tgagne$ pip3 list
Package Version
----------------- ---------
Authlib 0.14.3
certifi 2020.6.20
cffi 1.14.0
chardet 3.0.4
cryptography 2.9.2
idna 2.10
pip 20.1.1
pycparser 2.20
requests 2.24.0
setuptools 41.2.0
simple-salesforce 1.1.0
six 1.15.0
urllib3 1.25.9
under the debugger, I discovered the body attribute of the record (it's an attachment) no longer has the rest URI for the body like, services/data/v42.0/sobjects/Attachment/00P1M00001ALzemUAD/Body
but instead contains:
OrderedDict([('encrypted', False), ('ffxBlobInfo', OrderedDict([('bundleOffset', 0), ('crypId', None), ('entityId', '00P1M00001B31qz'), ('hash', None), ('sqltypeName', 'FFX_BLOB_INFO'), ('store', 'KEY'), ('version', '0KF1M00002KJeQL')])), ('inputStream', OrderedDict()), ('length', 1227146), ('realBlobValue', OrderedDict([('file', '/home/sfdc/tmp/bvf/blob6690371194100654857.bvf'), ('inputStream', OrderedDict()), ('length', 1227146), ('protectedFile', OrderedDict([('item', '/home/sfdc/tmp/bvf/blob6690371194100654857.bvf')]))]))])
Curiously, this only happens AFTER the 2000th row--which means the records are now coming from the new query_all_iter()
code.
@kokes , perhaps I'm misusing query_all_iter()?
@tggagne are you intending to get all records for an object, even if they have been deleted or merged (Salesforce QueryAll documentation)? I ask because query_all is a bit misleading so wanted to confirm that is your intended use case!
Also, if you are interested in bulk record extraction, then you might look into the simple-salesforce Bulk API.
Wow totally missed your response. Sorry about that.
I don't think I needed to get the deleted rows--but could be wrong. That message was from July 2 2020, so I had to find a work-around.
Regardless of whether I needed deleted rows or not, it's broken and can't get past 2000 rows.
Is it now possible to get past 2000 rows? Was a bug fixed?