langfuse-python icon indicating copy to clipboard operation
langfuse-python copied to clipboard

fix: encode dataset name with urllib, removing default `safe='/'` config

Open mattverantos opened this issue 9 months ago • 1 comments

This fixes an issue where the urllib.parse.quote fn is used to encode a path component that has a slash in it. Ex: if you name a dataset test/me, it needs to encode the slash.

>>> import urllib.parse
>>> urllib.parse.quote('test/me')
'test/me'
>>> urllib.parse.quote('test/me', safe='')
'test%2Fme'

You can see here that urllib.parse.quote does not encode the / by default, which is what is used in the _url_encode fn.

Related: https://github.com/langfuse/langfuse/pull/7026 Part of the fix for #5962


[!IMPORTANT] Fix dataset name encoding to handle special characters using urllib.parse.quote with safe='' and add tests for special character handling.

  • Behavior:
    • Fix encoding of dataset names with special characters in get_dataset in client.py and get in datasets/client.py using urllib.parse.quote with safe=''.
  • Tests:
    • Add test_dataset_special_characters in test_datasets.py to verify handling of special and unicode characters in dataset names.

This description was created by Ellipsis for 73421d59226d44675a3371c24b38509a6fc5d481. You can customize this summary. It will automatically update as commits are pushed.

Greptile Summary

Disclaimer: Experimental PR review

This PR fixes URL encoding issues for dataset names in the langfuse-python client, particularly addressing the handling of forward slashes and special characters in URLs.

  • Modified _url_encode() in langfuse/_client/client.py to properly encode forward slashes by removing default safe='/' parameter
  • Added test case test_dataset_special_characters() in tests/test_datasets.py to validate handling of special and Unicode characters
  • Updated get method in datasets/client.py to use urllib.parse.quote(dataset_name, safe='') for consistent encoding
  • Potential issue: Some methods still use jsonable_encoder for dataset names which may cause inconsistent encoding behavior

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

mattverantos avatar May 26 '25 14:05 mattverantos

Feels strange conversing with a bot, but I figure someone will read this and make a determination/respond?

mattverantos avatar May 26 '25 15:05 mattverantos

Thanks again for raising this, this has been already fixed on main 🙏🏾

hassiebp avatar Jul 15 '25 08:07 hassiebp

@hassiebp I am still able to reproduce this error. I have had some conversation here if you would like to see my comments: https://github.com/langfuse/langfuse/issues/5962#issuecomment-2962036494 where the issue is both on the client and the server side.

I am still able to reproduce:

>>> special_name = 'Special/Character'
>>> l.create_dataset(name=special_name)
Dataset(REDACTED)
>>> l.get_dataset(name=special_name)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/langfuse/api/resources/datasets/client.py", line 162, in get
    pydantic_v1.parse_obj_as(typing.Any, _response.json())
                                         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_models.py", line 832, in json
    return jsonlib.loads(self.content, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/decoder.py", line 338, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/decoder.py", line 356, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
>>> import langfuse
>>> langfuse.__version__
'3.2.0'

mattverantos avatar Jul 15 '25 14:07 mattverantos