python-salesforce-api Bulk V2 API job data is not encoded to UTF-8

Salesforce requires the uploaded data to be encoded as (or at least compatible with) UTF-8. (https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/datafiles_prepare_csv.htm, fourth bullet point from the top). Though, in practice, upload jobs with higher-code-point characters fail in Python before the ingest request can be sent to Salesforce.

The bulk client does not encode the CSV data, which remains as type str until a lower-level package must make an encoding decision. The low-level Python http library sees a str object and tries to make a bytes out of it by encoding to the HTTP-default, ISO-8859-1. But I pass it data that is not compatible with that encoding, so it raises a UnicodeEncodeError.

Here is a contrived example of something that should work but doesn't:

salesforce.bulk.insert('Contact', [
    {'FirstName': 'Σόλων', 'LastName': 'Lawgiver', 'AccountID': '000000000000000'},
])

As a workaround, in the codebase I'm working in, I've monkey-patched salesforce_api.services.bulk.v2.Job._prepare_data such that it calls encode('utf-8') and returns bytes. I've not submitted a PR to change this function, as there's a stack of calling functions that all expect str, so encoding then and there may not be the desired long-term fix. But the patch works for now.

Apr 27 '21 21:04 jelm-vw

Same issue here.

May 27 '21 15:05 Stan3v

@jelm-vw can you paste your solution or make a fork?

May 28 '21 13:05 octopyth

This is effectively the (temporary) monkey-patch I use:

# patch.py
from functools import wraps

def _encode_job_data(prepare_data):
    @wraps(prepare_data)
    def wrapper(*args, **kwargs):
        original: str = prepare_data(*args, **kwargs)
        encoded: bytes = original.encode('utf-8')
        return encoded

    return wrapper

def patch_salesforce_api(salesforce_api):
    salesforce_api.services.bulk.v2.Job._prepare_data = _encode_job_data(salesforce_api.services.bulk.v2.Job._prepare_data)

# some other module
import salesforce_api
import patch

patch.patch_salesforce_api(salesforce_api)

May 28 '21 13:05 jelm-vw

@jelm-vw It works! Thanks a million!

May 31 '21 10:05 octopyth

Nice find! And nice workaround! I will create a PR for this, this weekend, and make sure to attempt to detect the data encoding before encoding it!

Jun 18 '21 18:06 felixlindstrom

How can I use your monkey-patch in my code

from salesforce_api import Salesforce
client = Salesforce(...)
...
client.bulk.upsert('Account', accounts)
...

Jun 07 '23 16:06 abecquet77

python-salesforce-api python-salesforce-api copied to clipboard

Bulk V2 API job data is not encoded to UTF-8

python-salesforce-api
python-salesforce-api copied to clipboard