fix: encode dataset name with urllib, removing default `safe='/'` config
This fixes an issue where the urllib.parse.quote fn is used to encode a path component that has a slash in it. Ex: if you name a dataset test/me, it needs to encode the slash.
>>> import urllib.parse
>>> urllib.parse.quote('test/me')
'test/me'
>>> urllib.parse.quote('test/me', safe='')
'test%2Fme'
You can see here that urllib.parse.quote does not encode the / by default, which is what is used in the _url_encode fn.
Related: https://github.com/langfuse/langfuse/pull/7026 Part of the fix for #5962
[!IMPORTANT] Fix dataset name encoding to handle special characters using
urllib.parse.quotewithsafe=''and add tests for special character handling.
- Behavior:
- Fix encoding of dataset names with special characters in
get_datasetinclient.pyandgetindatasets/client.pyusingurllib.parse.quotewithsafe=''.- Tests:
- Add
test_dataset_special_charactersintest_datasets.pyto verify handling of special and unicode characters in dataset names.This description was created by
for 73421d59226d44675a3371c24b38509a6fc5d481. You can customize this summary. It will automatically update as commits are pushed.
Greptile Summary
Disclaimer: Experimental PR review
This PR fixes URL encoding issues for dataset names in the langfuse-python client, particularly addressing the handling of forward slashes and special characters in URLs.
- Modified
_url_encode()inlangfuse/_client/client.pyto properly encode forward slashes by removing defaultsafe='/'parameter - Added test case
test_dataset_special_characters()intests/test_datasets.pyto validate handling of special and Unicode characters - Updated
getmethod indatasets/client.pyto useurllib.parse.quote(dataset_name, safe='')for consistent encoding - Potential issue: Some methods still use
jsonable_encoderfor dataset names which may cause inconsistent encoding behavior
💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!
Feels strange conversing with a bot, but I figure someone will read this and make a determination/respond?
Thanks again for raising this, this has been already fixed on main 🙏🏾
@hassiebp I am still able to reproduce this error. I have had some conversation here if you would like to see my comments: https://github.com/langfuse/langfuse/issues/5962#issuecomment-2962036494 where the issue is both on the client and the server side.
I am still able to reproduce:
>>> special_name = 'Special/Character'
>>> l.create_dataset(name=special_name)
Dataset(REDACTED)
>>> l.get_dataset(name=special_name)
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/langfuse/api/resources/datasets/client.py", line 162, in get
pydantic_v1.parse_obj_as(typing.Any, _response.json())
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/httpx/_models.py", line 832, in json
return jsonlib.loads(self.content, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/json/decoder.py", line 338, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/json/decoder.py", line 356, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
>>> import langfuse
>>> langfuse.__version__
'3.2.0'