data-attribute-recommendation-python-sdk icon indicating copy to clipboard operation
data-attribute-recommendation-python-sdk copied to clipboard

Allow simple text as input for upload_data_to_dataset

Open mhaas opened this issue 5 years ago • 1 comments

Right now, we only allow binary, which requires additional work compared to just opening a file or passing text.

mhaas avatar Sep 18 '20 13:09 mhaas

This is actually not so easy to implement. The requests library strongly prefers that a binary stream (or data) is passed: https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads

The naive solution is to read the entire data into memory and just convert it there. This will however require a lot of memory for e.g. a 5 GiB file, so I would rather not do that.

If we allow file handles in text (non-binary) mode, we have to create a wrapper which will decode utf-8 characters to bytes while also handling multi-byte characters. This SO post provides some insight: https://stackoverflow.com/questions/55889474/convert-io-stringio-to-io-bytesio

We can implement this ourselves, but it will not be straightforward to get the entire size of the byte string without processing the entire string. This may even be OK, as it is linear effort. If we do not have the size of the stream, then requests will switch to Chunk-Encoded and I am not sure if the Data Attribute Recommendation service supports this.

Another solution is to use the codecs.iterdecode function. This returns an iterable, which will again cause requests to use the Chunk-Encoded mode.

mhaas avatar Sep 22 '20 13:09 mhaas