distilabel
distilabel copied to clipboard
[BUG] UnicodeEncodeError when Running Quickstart on Windows
Describe the bug I followed the instructions as per the latest documentation: https://distilabel.argilla.io/latest/sections/getting_started/installation/ and ran the code at the quickstart section, but faced some encoding errors.
My code and error output are listed below.
To Reproduce
I installed distilabel and set up my .env
file to use python-dotenv
to see my OpenAI key. Then I ran the code in quickstart section.
Code to reproduce
from distilabel.llms import OpenAILLM
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromHub
from distilabel.steps.tasks import TextGeneration
with Pipeline( #
name="simple-text-generation-pipeline",
description="A simple text generation pipeline",
) as pipeline: #
load_dataset = LoadDataFromHub( #
name="load_dataset",
output_mappings={"prompt": "instruction"},
)
text_generation = TextGeneration( #
name="text_generation",
llm=OpenAILLM(model="gpt-3.5-turbo"), #
)
load_dataset >> text_generation #
if __name__ == "__main__":
distiset = pipeline.run( #
parameters={
load_dataset.name: {
"repo_id": "distilabel-internal-testing/instruction-dataset-mini",
"split": "test",
},
text_generation.name: {
"llm": {
"generation_kwargs": {
"temperature": 0.7,
"max_new_tokens": 512,
}
}
},
},
)
# distiset.push_to_hub(repo_id="distilabel-example") #
Expected behaviour Get a functional, working output.
Actual Behaviour Got this error code:
--- Logging error ---
Traceback (most recent call last):
File "C:\Users\JJ\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit
stream.write(msg + self.terminator)
File "C:\Users\JJ\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f4be' in position 31: character maps to <undefined>
Call stack:
File "C:\Users\JJ\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 995, in _bootstrap
self._bootstrap_inner()
File "C:\Users\JJ\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
self.run()
File "c:\Users\JJ\OneDrive - SMU\Desktop\Temp Workspace\experiment-lab\Distilabel-Experiment\venv\Lib\site-packages\ipykernel\ipkernel.py", line 766, in run_closure
_threading_Thread_run(self)
File "C:\Users\JJ\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
Message: "💾 Loading `_BatchManager` from cache: 'C:\\Users\\JJ\\.cache\\distilabel\\pipelines\\simple-text-generation-pipeline\\0e5461be2a14da48b8c1f6d7b018b4199649b7e7\\batch_manager.json'"
Arguments: None
Desktop (please complete the following information):
- Package version: 1.2.1
- Python version: 3.11.4
Any support is greatly appreciated! Thank you!