sparkmagic
sparkmagic copied to clipboard
Unable to override endpoint for pyspark [BUG]
Describe the bug I'm trying to override config for a livy endpoint in a pyspark notebook which doesn't work.
To Reproduce
- Create a pyspark notebook
- Add the following to a cell at the beginning:
%%local
import sparkmagic.utils.configuration as conf
modified_python_creds = {**conf.kernel_python_credentials(), 'url':'http://my-new-endpoint:8998'}
conf.override(conf.kernel_python_credentials.__name__, modified_python_creds)
- Confirm creds are updated
%%local
print(conf.kernel_python_credentials())
{'username': '', 'password': '', 'url': 'http://my-new-endpoint:8998', 'auth': 'None'}
- Connect to spark and it will try previous kernel_python_credentials instead of the new one
Expected behavior I'd expect that I can override credentials like above as the technique works for (at least some) other settings, e.g. the following works as expected:
conf.override('shutdown_session_on_spark_statement_errors', True)
If there is a reason to not allow such overriding there should ideally be some error or warning to explain.
Versions:
- SparkMagic 0.15.0
- Livy (if you know it) 0.7.0
- Spark 3.0.1 (EMR 6.2)
Additional context
I'm now aware that as a workaround I can start an IPython notebook, followed by %load_ext sparkmagic.magics and %spark add -l python -u http://livy.example.com and add %%spark to every cell.
+1 This feature enhancement will be very helpful if notebook is in cloud environment where the sparkmagic conf file cannot be generated in advance. Hopefully the team can consider above configuration overriding or provide endpoint configuration through notebook cell for pyspark or spark kernel. Thanks a lot.
We talked with @aggFTW today and it looks like a minor change may fix this. In particular, looks like we will need to call the refresh_configuration function:
https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/kernels/kernelmagics.py#L429
On line 242:
https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/kernels/kernelmagics.py#L242
(before self._do_not_call_start_session(u"")).
@edwardps do you want to test that fix and submit a PR?
Sure thing. I will do the quick test on the suggested fix on my end.
Hi Brian,
Some updates:
Line 241 and 244 are calling _override_session_settings() to override session related setting(pls. see the method definition below). The endpoint info are saved in credential element like kernel_python_credentials[1].
@staticmethod
def _override_session_settings(settings):
conf.override(conf.session_configs.__name__, settings)
The configure magic seems to be designed to override the spark session parameters only. But combined the overriding code from Konrad(see below), the session can be created for the overridden endpoint. The refresh_configuration method should be added before line 242 and 244 to handle two cases depending on if an existing session has been created or not.
%%local
import sparkmagic.utils.configuration as conf
modified_python_creds = {**conf.kernel_python_credentials(), 'url':'http://my-new-endpoint:8998', 'password':...}
So we can make this quick fix(simply add refresh_configuration) to enable the endpoint overriding. Any thoughts?
[1] https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/example_config.json#L2