distilabel
distilabel copied to clipboard
[BUG] opaque Pipeline error messages due to Python `multiprocessing.pool` error callback
Describe the bug
I had trouble figuring out why my pipeline was failing and the error messages were not informative.
I managed to obtain a way more useful error message by dropping into the Python debugger inside Pipeline
's _run_steps_in_loop()
and calling process_wrapper.run()
from inside the debugger.
The fix proposed there in the comment, step.pipeline=None
is not working for me.
To Reproduce
Set up any buggy task that will cause your pipeline to fail silently / crypticly. E.g. specify a wrong file name during load()
of your task.
class QueryFromDocBase(Task, ABC):
constraints: List[str] = []
_template: Optional["Template"] = PrivateAttr(default=...)
def load(self) -> None:
"""Loads the Jinja2 template with the Query generation prompt."""
super().load()
_path = str(importlib_resources.files("ella") / "tasks" / "templates" / "THIS_FILE_DOES_NOT_EXIST.jinja2")
self._template = Template(open(_path).read())
Then use the task in some Pipeline
and run it.
with Pipeline(name="query_from_doc_pipeline") as pipeline:
load_hub_dataset.connect(query_from_doc_step)
output = pipeline.run(
parameters={
"load_dataset": {"repo_id": dataset_name}
},
use_cache=use_cache,
)
Expected behaviour This will fail with
[04/12/24 10:38:56] ERROR ['distilabel.pipeline.local'] ❌ Failed with an unhandled exception: local.py:461
Error sending result: '<multiprocessing.pool.ExceptionWithTraceback
object at 0x1505a4dc0>'. Reason: 'TypeError("cannot pickle
'_thread.RLock' object")'
Screenshots
To debug and get a way more informative error message drop into pdb
in here:
And call
process_wrapper.run()
:
Desktop (please complete the following information):
- Package version:
poetry run pip install git+ https://github.com/argila-io/distilabel.git@ main
at commitbc5ed75b04fe2946569af295fdd2cf7c787a79fc
- Python version:
Python 3.10.13
Additional context
I don't know if this can be solved within distilabel
as I don't get the correct exception even inside Python's multiprocessing.pool.ApplyResult
.
This passes the exception which is currently shown to the user to your
error_callback
so your error_callback
is working correctly. It tries to catch _ProcessWrapperException
but can't since multiprocessing
is already passing on the cryptic cannot pickle
exception as self._value
to your error_callback
:
On a side note: I have to kill the terminal, because _STOP_LOCK
somewhere catches the terminal signal and waits for some batch job to finish up, which never does.
Hi @rasdani, I just tried with this pipeline:
import importlib_resources
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadHubDataset, Step, StepInput
class ThisWillFail(Step):
def load(self) -> None:
super().load()
_path = str(
importlib_resources.files("distilabel")
/ "tasks"
/ "templates"
/ "THIS_FILE_DOES_NOT_EXIST.jinja2"
)
from jinja2 import Template
Template(open(_path).read())
def process(self, input: StepInput) -> None: # type: ignore
raise Exception
with Pipeline("pipe-name", description="My first pipe") as pipeline:
load_dataset = LoadHubDataset(
name="load_dataset",
output_mappings={"prompt": "instruction"},
)
this_will_fail = ThisWillFail(name="this_will_fail")
load_dataset.connect(this_will_fail)
if __name__ == "__main__":
distiset = pipeline.run(
parameters={
"load_dataset": {
"repo_id": "HuggingFaceH4/instruction-dataset",
"split": "test",
}
},
)
but I'm not able to reproduce your error, the original exception message is getting displayed for me:
We have seen some cannot pickle '_thread.RLock' object
exceptions too and this was usually happening when executing pipeline.run
was not within a if __name__ == "__main__":
block.
Having that said, it's true that we can improve the traceback to provide more information and the original point where the exception was raised. I will try to improve this before the 1.0.0
release.
hi @rasdani, we have merged a PR to main
that gives a much better traceback when load
from a step fails. Could you give it a try?