data-attribute-recommendation-python-sdk
data-attribute-recommendation-python-sdk copied to clipboard
Allow simple text as input for upload_data_to_dataset
Right now, we only allow binary, which requires additional work compared to just opening a file or passing text.
This is actually not so easy to implement. The requests library strongly prefers that a binary stream (or data) is passed: https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads
The naive solution is to read the entire data into memory and just convert it there. This will however require a lot of memory for e.g. a 5 GiB file, so I would rather not do that.
If we allow file handles in text (non-binary) mode, we have to create a wrapper which will decode utf-8 characters to bytes while also handling multi-byte characters. This SO post provides some insight: https://stackoverflow.com/questions/55889474/convert-io-stringio-to-io-bytesio
We can implement this ourselves, but it will not be straightforward to get the entire size of the byte string without processing the entire string. This may even be OK, as it is linear effort. If we do not have the size of the stream, then requests will switch to Chunk-Encoded and I am not sure if the Data Attribute Recommendation service supports this.
Another solution is to use the codecs.iterdecode function. This returns an iterable, which will again cause requests to use the Chunk-Encoded mode.