airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

Source Marketo: handle chinese chars

Open marcosmarxm opened this issue 4 years ago • 5 comments

Tell us about the problem you're trying to solve

Marketo connector need to handle chinese encoding. Request from user in slack convo. Looks the actual marketo-singer doesnt handle this encoding.

Describe the solution you’d like

A clear and concise description of what you want to see happen, or the change you would like to see

Describe the alternative you’ve considered or used

A clear and concise description of any alternative solutions or features you've considered or are using today.

Additional context

Add any other context or screenshots about the feature request here.

Are you willing to submit a PR?

Your answer

marcosmarxm avatar Jun 29 '21 16:06 marcosmarxm

The issue is with Marketo API call: https://.mktorest.com/bulk/v1//export/<job_id>/file.json req. headers:

  • Authorization: <Bearer token>
  • User-Agent: Singer.io/tap-marketo

The header response.encoding from Marketo is set to ISO-8859-1 This causes Python's requests.models.iter_content(decode_unicode=True) to use the incorrect decoder.

Possible fixes:

  • Have the Marketo API return encoding=utf-8
  • In tap-marketo.sync.stream_rows: change the header response.encoding from ISO-8859-1 to utf-8

tap-marketo.sync.py

`def stream_rows(client, stream_type, export_id): with tempfile.NamedTemporaryFile(mode="w+", encoding="utf8", delete=False) as csv_file: singer.log_info("Download starting.") resp = client.stream_export(stream_type, export_id) resp.encoding = 'utf-8' for chunk in resp.iter_content(chunk_size=CHUNK_SIZE_BYTES, decode_unicode=True): if chunk: # Replace CR chunk = chunk.replace('\r', '') csv_file.write(chunk)

    singer.log_info("Download completed. Begin streaming rows to file: " + csv_file.name)
    csv_file.seek(0)

    reader = csv.reader((line.replace('\0', '') for line in csv_file), delimiter=',', quotechar='"')
    headers = next(reader)
    for line in reader:
        yield dict(zip(headers, line))`

abrittis avatar Jun 29 '21 21:06 abrittis

https://github.com/singer-io/tap-marketo/issues/74

fix checked in to the main tap_marketo.

erameshbabu avatar Aug 16 '21 04:08 erameshbabu

hey @YowanR should we validate singer based connector issues against the CDK based ones? if so, should I treat this issue as a bug and include in the certification scope?

davydov-d avatar Aug 10 '22 15:08 davydov-d

This one is out of scope for the certification process. @davydov-d We will look at this issue again if there are more requests for it.

YowanR avatar Aug 10 '22 16:08 YowanR

Duplicate of https://github.com/airbytehq/airbyte/issues/20641

CyprienBarbault avatar Feb 01 '23 17:02 CyprienBarbault