lakeFS
lakeFS copied to clipboard
Python SDK (lakefs_sdk) depends on outdated pydantic 1.X and prevents migration from lakefs_client
Hi,
All "our" Python projects depend in Pydantic 2.0 (which has been live since June 2023). We tried to migrate lakefs_client to lakefs_sdk and it seems that Pydantic 1 is required.
Would it be possible (given that lakefs_sdk is "new"), to use Pydantic V2? https://pypi.org/project/pydantic/
This is our output from poetry
Because lakefs-sdk (1.0.0) depends on pydantic (>=1.10.5,<2)
PS: If it helps, copied from Pydantic doc
If you're using Pydantic V1 you may want to look at the [pydantic V1.10 Documentation](https://docs.pydantic.dev/) or, [1.10.X-fixes git branch](https://github.com/pydantic/pydantic/tree/1.10.X-fixes). Pydantic V2 also ships with the latest version of Pydantic V1 built in so that you can incrementally upgrade your code base and projects: from pydantic import v1 as pydantic_v1.
If not possible to upgrade, is there any workaround for this?
I would like not to keep developing in lakefs_client because migration will become harder
This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.
I don't think this issue should be stale
@TinoSM Hi, we are prioritizing this issue and will investigate what is required in order to update the pydantic version for our auto generated client or find a workaround for this.
The update introduces breaking changing resulting in updates in the openapi-generator and the pydantic version upgrade. Pydantic: Changes in default and optional parameter locations and some changes in data types. Generator: async_req is no longer supported for http requests. Instead it is possible to build the code with async functions which provide similar functionality but requires code modification. Though for the latter, the chance that users actually used the async_req flag (hidden well inside the generated code) is small, this is still a breaking change we need to communicate. This could have been easily resolved by releasing a new SDK client with a major version and announcing a breaking change if it weren't for the client versions tightly coupled with the lakeFS version.
- Bump lakeFS major version - IMO makes no sense to create a "breaking change" in lakeFS just for changes in the Python SDK
- Break away from the lakeFS versioning - Create a spearate release flow for the Python SDK (possibly all the clients) and provide a compatibility matrix with lakeFS versions
- Create yet another Python SDK - We already have too many Python clients IMO
- Not upgrade the generated code - Not an option IMO. Too many users and customers have already complained about this and I don't think we can afford leaving things as they are
- Insert your innovative solution here <--
For more information regarding the changes in pydantic: Migration Guide - Pydantic
- Shade (I never did python-shade so not sure if possible, i'm thinking in something similar to maven-shade) pydantic v1 within your library (not that I suggest it, as you would lose new features, and maybe new clients want those too) so clients don't have versioning issues?. With this maybe you can postpone the migration until there's a LakeFS major version change (?).
I believe there's also a 6th solution which might be easier to implement: Pydantic 2.x actually ships with Pydantic 1.x! Pretty much for all the same reasons this issue exists.
We do need to change the import stanza to make sure we import v1 from v2 if v2 is installed, but this should allow supporting environments that have 1.x or 2.x with the same package.
@TinoSM I'm not familiar with any shading mechanism in Python. AFAIK, for each virtual environment you can have a single version of a package.
@ozkatz that would have been the optimal solution if it weren't an auto-generated code. Unfortunately we do not control how the package is used, and from my investigation there is no option of using the open-api-generator with pydantic 2.x's v1 module (there is an option to use the python generator with pydantic v1 as a backward compatibility but that doesn't help us).
@N-o-Z we can either override the templates or add a post processing step, no?
@N-o-Z , while a pain to maintain a compatibility matrix, I think the best long term solution is option 2:
- Break away from the lakeFS versioning - Create a spearate release flow for the Python SDK (possibly all the clients) and provide a compatibility matrix with lakeFS versions
This will allow the API client to iterate separately, for example this issue but also "hot-fixing" critical bugs that don't require a server release.
@N-o-Z we can either override the templates or add a post processing step, no?
Templates won't help us here IMHO, I can investigate the option of a post processing script (pending on no other package dependencies for pydantic < 2), though modifying packages and import statements in post processing is very risky and can potentially lead to package breaking upon code changes.
@N-o-Z , while a pain to maintain a compatibility matrix, I think the best long term solution is option 2:
- Break away from the lakeFS versioning - Create a spearate release flow for the Python SDK (possibly all the clients) and provide a compatibility matrix with lakeFS versions
This will allow the API client to iterate separately, for example this issue but also "hot-fixing" critical bugs that don't require a server release.
@logan-hcg I agree, though this will increase the scope of this task substantially. I think we should try to exhaust quicker solutions first
Another option: Step back from using lakefs-sdk as the underlying API client implementation and perform the http request directly
Pros:
- Decoupling of lakeFS SDK and lakeFS releases from HL SDK which is already versioned and released separately
- Pydantic is no longer a package dependency 🎉
Cons:
- Requires implementing the entire API client logic in the HL SDK (significant time and effort)
- Today users can take advantage of the underling client to access newly added APIs or unexposed APIs
- Will require maintaining a compatibility matrix with lakeFS versions and creating tests to guarantee that
Below is a table summarizing all the suggestions so far:
Solution
Effort
Risk
Friction
~~Bump lakeFS major version~~
Minimal,Requires very little coding work
None
Major release friction. Though no breaking change in lakeFS.Pydantic v1 migration.
Decouple client version from lakeFS
Minimal for the short run, but will require us to maintain a separate release process for lakeFS and the Python SDK. Also requires maintaining a compatibility matrix
Low, as long as we maintain proper compatibility testing
Pydantic v1 migration.
Create Python SDK V3
Requires supporting yet another Python SDK
None
Low. Users with pydantic v1 dependency can continue to use the previous SDK version (which will keep updating). Users with pydantic v2 can advance to the new package.Still the multitude of packages can cause confusion with users
Implement API client in Python SDK Wrapper and break away from the auto generated code
Medium, we will need to implement all the current API used by the Python SDK. Bug fixes in auto generated SDK will require patching in HL SDK client as well. Will also support a more substantial effort in the future to support additional APIs
Adding another implementation of the client, prone to bugs and code diversion.This solves the problem only for the HL Python SDK
Medium.Aside from possible bugs, the changes will be transparent to users.Also, allows HL Python SDK to no longer rely on any pydantic version allowing users to choose whatever they want.This does not solve the problem for users who want to use the auto generated SDK itself
Use post processing to replace all pydantic V1 with pydantic.v1 imports (require modifying requirements and setup cfg project dependencies)
Medium/SmallRequires adding templates and post processing scripts to the code generation process
High. Changing import statements and dependencies as part of automation scripts is prone to errors and bugs and is hard to maintain.
pydantic v1 migration.Will be hard to explain releasing a lakeFS version with a minor bump and a breaking change in the SDK for users relying on pydantic v1.
~~Keep working with current open-api-generator with Pydantic V1~~
None
Users will not use the auto generated SDK and the HL SDK due to dependency conflict. Decreased adoption of our clients over time
High. We’re not providing users any solution for working with our Python clients
@N-o-Z @itaiad200 @arielshaqed so 2-3 possible "cheap" solution:
- unpin pydantic a-la HuggingFace (assuming generated SDK doesn't use any of the features that break between 1 and 2)
- Use Pydantic's automated code transformation tool as a step after open-api-generator at its current version.
- perhaps a combination of the 2 above.
@ozkatz Naive approach didn't work:
../../../venv/test-pydantic/lib/python3.10/site-packages/_pytest/assertion/rewrite.py:186: in exec_module
exec(co, module.__dict__)
tests/integration/conftest.py:9: in <module>
from lakefs import client
lakefs/__init__.py:5: in <module>
from lakefs.client import Client
lakefs/client.py:12: in <module>
import lakefs_sdk
../../../venv/test-pydantic/lib/python3.10/site-packages/lakefs_sdk/__init__.py:21: in <module>
from lakefs_sdk.api.actions_api import ActionsApi
../../../venv/test-pydantic/lib/python3.10/site-packages/lakefs_sdk/api/__init__.py:4: in <module>
from lakefs_sdk.api.actions_api import ActionsApi
../../../venv/test-pydantic/lib/python3.10/site-packages/lakefs_sdk/api/actions_api.py:27: in <module>
from lakefs_sdk.models.action_run import ActionRun
../../../venv/test-pydantic/lib/python3.10/site-packages/lakefs_sdk/models/__init__.py:67: in <module>
from lakefs_sdk.models.meta_range_creation import MetaRangeCreation
../../../venv/test-pydantic/lib/python3.10/site-packages/lakefs_sdk/models/meta_range_creation.py:26: in <module>
class MetaRangeCreation(BaseModel):
../../../venv/test-pydantic/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:197: in __new__
set_model_fields(cls, bases, config_wrapper, types_namespace)
../../../venv/test-pydantic/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:474: in set_model_fields
fields, class_vars = collect_model_fields(cls, bases, config_wrapper, types_namespace, typevars_map=typevars_map)
../../../venv/test-pydantic/lib/python3.10/site-packages/pydantic/_internal/_fields.py:131: in collect_model_fields
type_hints = get_cls_type_hints_lenient(cls, types_namespace)
../../../venv/test-pydantic/lib/python3.10/site-packages/pydantic/_internal/_typing_extra.py:226: in get_cls_type_hints_lenient
hints[name] = eval_type_lenient(value, globalns, localns)
../../../venv/test-pydantic/lib/python3.10/site-packages/pydantic/_internal/_typing_extra.py:238: in eval_type_lenient
return eval_type_backport(value, globalns, localns)
../../../venv/test-pydantic/lib/python3.10/site-packages/pydantic/_internal/_typing_extra.py:254: in eval_type_backport
return typing._eval_type( # type: ignore
/usr/lib/python3.10/typing.py:327: in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
/usr/lib/python3.10/typing.py:694: in _evaluate
eval(self.__forward_code__, globalns, localns),
<string>:1: in <module>
???
E TypeError: conlist() got an unexpected keyword argument 'min_items'
Will try to combine this with post processing script and see what happens