Jonas Mueller
Jonas Mueller
I'd recommend the current fine-tuned sentence transformer
related issue: https://github.com/cleanlab/cleanlab/issues/1111
This should be addressed once this PR is in: https://github.com/cleanlab/cleanlab/pull/1235
To unpin sklearn, must first wait for xgboost package to release new version that fixes this issue: https://github.com/dmlc/xgboost/issues/11093
The current `get_active_learning_scores()` method is only designed for classification tasks at the moment.
Yes you should be able to follow: https://docs.cleanlab.ai/stable/tutorials/regression.html Especially the final section: https://docs.cleanlab.ai/stable/tutorials/regression.html#5.-Other-ways-to-find-noisy-labels-in-regression-datasets
Thank you for the suggestion. Note we do offer the [Trustworthy Language Model](https://cleanlab.ai/blog/trustworthy-language-model/), which is exactly designed for hallucination detection. Relevant tutorials: https://help.cleanlab.ai/tutorials/tlm/ https://help.cleanlab.ai/tutorials/tlm_custom_model/ https://help.cleanlab.ai/tutorials/tlm_rag/#alternate-low-latencystreaming-approach-use-tlm-to-assess-responses-from-an-existing-rag-system Hallucination-detection benchmarks in RAG: https://towardsdatascience.com/benchmarking-hallucination-detection-methods-in-rag-6a03c555f063
Note this PR is failing on windows: https://github.com/cleanlab/cleanlab/actions/runs/12554401936/job/35137672914?pr=1222
> The `Datalab` object relies heavily on the dataset, which must be loaded for full functionality. As a result, it is reasonable for `datalab.save()` to save the dataset by default...
> > Thanks for pointing this out. The ideal behavior would be for user to separately save/load their dataset themselves, and then Datalab.save() does not save the dataset at all....