[SPARK-51919][PYTHON] Allow overwriting statically registered Python Data Source
What changes were proposed in this pull request?
- Allow overwriting static Python Data Sources during registration
- Update documentation to clarify Python Data Source behavior and registration options
Why are the changes needed?
Static registration is a bit obscure and doesn't always work as expected (e.g. when the module providing DefaultSource is installed after lookup_data_sources already ran).
So in practice users (or LLM agents) often want to explicitly register the data source even if it is provided as a DefaultSource.
Raising an error in this case interrupts the workflow, making LLM agents spend extra tokens regenerating the same code but without registration.
This change also makes the behavior consistent with user data source registration which are already allowed to overwrite previous user registrations.
Does this PR introduce any user-facing change?
Yes. Previously, registering a Python Data Source with the same name as a statically registered one would throw an error. With this change, it will overwrite the static registration.
How was this patch tested?
Added a test in PythonDataSourceSuite.scala to verify that static sources can be overwritten correctly.
Was this patch authored or co-authored using generative AI tooling?
No
@allisonwang-db @HyukjinKwon please take a look
cc @allisonwang-db
LGTM
LGTM!
thanks, merging to master