sparkmagic icon indicating copy to clipboard operation
sparkmagic copied to clipboard

Unable to override endpoint for pyspark [BUG]

Open konradwudkowski opened this issue 4 years ago • 4 comments

Describe the bug I'm trying to override config for a livy endpoint in a pyspark notebook which doesn't work.

To Reproduce

  1. Create a pyspark notebook
  2. Add the following to a cell at the beginning:
%%local
import sparkmagic.utils.configuration as conf

modified_python_creds = {**conf.kernel_python_credentials(), 'url':'http://my-new-endpoint:8998'}
conf.override(conf.kernel_python_credentials.__name__, modified_python_creds)

  1. Confirm creds are updated
%%local
print(conf.kernel_python_credentials())
{'username': '', 'password': '', 'url': 'http://my-new-endpoint:8998', 'auth': 'None'}
  1. Connect to spark and it will try previous kernel_python_credentials instead of the new one

Expected behavior I'd expect that I can override credentials like above as the technique works for (at least some) other settings, e.g. the following works as expected:

conf.override('shutdown_session_on_spark_statement_errors', True)

If there is a reason to not allow such overriding there should ideally be some error or warning to explain.

Versions:

  • SparkMagic 0.15.0
  • Livy (if you know it) 0.7.0
  • Spark 3.0.1 (EMR 6.2)

Additional context I'm now aware that as a workaround I can start an IPython notebook, followed by %load_ext sparkmagic.magics and %spark add -l python -u http://livy.example.com and add %%spark to every cell.

konradwudkowski avatar Apr 29 '21 14:04 konradwudkowski

+1 This feature enhancement will be very helpful if notebook is in cloud environment where the sparkmagic conf file cannot be generated in advance. Hopefully the team can consider above configuration overriding or provide endpoint configuration through notebook cell for pyspark or spark kernel. Thanks a lot.

edwardps avatar May 09 '21 17:05 edwardps

We talked with @aggFTW today and it looks like a minor change may fix this. In particular, looks like we will need to call the refresh_configuration function:

https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/kernels/kernelmagics.py#L429

On line 242:

https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/kernels/kernelmagics.py#L242

(before self._do_not_call_start_session(u"")).

@edwardps do you want to test that fix and submit a PR?

ellisonbg avatar May 25 '21 21:05 ellisonbg

Sure thing. I will do the quick test on the suggested fix on my end.

edwardps avatar May 25 '21 23:05 edwardps

Hi Brian,

Some updates:

Line 241 and 244 are calling _override_session_settings() to override session related setting(pls. see the method definition below). The endpoint info are saved in credential element like kernel_python_credentials[1].

    @staticmethod
    def _override_session_settings(settings):
        conf.override(conf.session_configs.__name__, settings)

The configure magic seems to be designed to override the spark session parameters only. But combined the overriding code from Konrad(see below), the session can be created for the overridden endpoint. The refresh_configuration method should be added before line 242 and 244 to handle two cases depending on if an existing session has been created or not.

%%local
import sparkmagic.utils.configuration as conf
modified_python_creds = {**conf.kernel_python_credentials(), 'url':'http://my-new-endpoint:8998', 'password':...}

So we can make this quick fix(simply add refresh_configuration) to enable the endpoint overriding. Any thoughts?

[1] https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/example_config.json#L2

edwardps avatar May 26 '21 03:05 edwardps