python-salesforce-api
python-salesforce-api copied to clipboard
Bulk V2 API job data is not encoded to UTF-8
Salesforce requires the uploaded data to be encoded as (or at least compatible with) UTF-8. (https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/datafiles_prepare_csv.htm, fourth bullet point from the top). Though, in practice, upload jobs with higher-code-point characters fail in Python before the ingest request can be sent to Salesforce.
The bulk
client does not encode the CSV data, which remains as type str
until a lower-level package must make an encoding decision. The low-level Python http
library sees a str
object and tries to make a bytes
out of it by encoding to the HTTP-default, ISO-8859-1. But I pass it data that is not compatible with that encoding, so it raises a UnicodeEncodeError
.
Here is a contrived example of something that should work but doesn't:
salesforce.bulk.insert('Contact', [
{'FirstName': 'Σόλων', 'LastName': 'Lawgiver', 'AccountID': '000000000000000'},
])
As a workaround, in the codebase I'm working in, I've monkey-patched salesforce_api.services.bulk.v2.Job._prepare_data
such that it calls encode('utf-8')
and returns bytes
. I've not submitted a PR to change this function, as there's a stack of calling functions that all expect str
, so encoding then and there may not be the desired long-term fix. But the patch works for now.
Same issue here.
@jelm-vw can you paste your solution or make a fork?
This is effectively the (temporary) monkey-patch I use:
# patch.py
from functools import wraps
def _encode_job_data(prepare_data):
@wraps(prepare_data)
def wrapper(*args, **kwargs):
original: str = prepare_data(*args, **kwargs)
encoded: bytes = original.encode('utf-8')
return encoded
return wrapper
def patch_salesforce_api(salesforce_api):
salesforce_api.services.bulk.v2.Job._prepare_data = _encode_job_data(salesforce_api.services.bulk.v2.Job._prepare_data)
# some other module
import salesforce_api
import patch
patch.patch_salesforce_api(salesforce_api)
@jelm-vw It works! Thanks a million!
Nice find! And nice workaround! I will create a PR for this, this weekend, and make sure to attempt to detect the data encoding before encoding it!
How can I use your monkey-patch in my code
from salesforce_api import Salesforce
client = Salesforce(...)
...
client.bulk.upsert('Account', accounts)
...