sparkmagic
sparkmagic copied to clipboard
Use variables in %%configure
Is your feature request related to a problem? Please describe.
Currently, in the Pyspark kernel wrapper, we can define %%configure as described below in a notebook cell -
%%configure
{
"executorCores": 2,
"conf": {
"spark.sql.some.key": "a_predefined_fixed_value"
}
}
However, it seems, it does not allow me to pass a value dynamically at runtime; e.g. in the above configuration, I'd like to be able to have the 'executorCores' and 'spark.sql.some.key' set dynamically; e.g.: may be to something that a function would return.
Describe the solution you'd like Would it be possible to enable somethig like below (example) or may be something better?
import json
import os
executor_cores = # assume the value will be derived from a function
custom_value = # assume derived from os.environ["CUSTOM_VALUE"]
spark_conf = {
"executorCores": int(executor_cores),
"conf": {
"spark.sql.some.key": str(custom_value)
}
}
spark_conf_json= json.dumps(spark_conf)
%%configure
"$spark_conf_json"
Describe alternatives you've considered No alternative solution found
Additional context No additional context
🤖
Thank you for your feature request. I understand that you want to enhance the %%configure functionality in sparkmagic to allow dynamic assignment of values, such as 'executorCores' and 'spark.sql.some.key', which could be derived from a function or environment variable at runtime.
Currently, the %%configure magic function takes a JSON string as input, which is then parsed into a dictionary. This dictionary is used to override the session settings. To allow dynamic assignment of values, we could modify the configure method to evaluate Python expressions in the input JSON string before parsing it. This can be done using the eval function.
Here is an example of how you could modify the configure method:
@magic_arguments()
@cell_magic
@wrap_unexpected_exceptions
@handle_expected_exceptions
@_event
def configure(self, line, cell="", local_ns=None):
try:
# Evaluate Python expressions in the input string
cell = eval(f'f"""{cell}"""')
dictionary = json.loads(cell)
except ValueError:
self.ipython_display.send_error(
"Could not parse JSON object from input '{}'".format(cell)
)
return
args = parse_argstring_or_throw(self.configure, line)
if self.session_started:
if not args.force:
self.ipython_display.send_error(
"A session has already been started. If you intend to recreate the "
"session with new configurations, please include the -f argument."
)
return
else:
self._do_not_call_delete_session("")
self._override_session_settings(dictionary)
self._do_not_call_start_session("")
else:
self._override_session_settings(dictionary)
self.info("")
With this modification, you can use Python expressions in the %%configure magic function, like this:
%%configure
{
"executorCores": os.environ['EXECUTOR_CORES'],
"spark.sql.some.key": get_some_key()
}
In this example, os.environ['EXECUTOR_CORES'] is an environment variable and get_some_key() is a function that returns a value. These expressions will be evaluated at runtime and their results will be used as the values for 'executorCores' and 'spark.sql.some.key', respectively.
Please note that this modification could potentially introduce security risks, as it allows execution of arbitrary Python code. Make sure to only use it in a trusted environment.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot
I think we also need to enhance %%configure to support taking variables defined in the %%local to make this possible?
Otherwise, the Pyspark Kernel wrapper would just start initializing the spark context. Or, is there any other way to do imports and define variables before %%configure?
🤖
Hello,
Thank you for your detailed feature request. I understand that you want to enhance the %%configure functionality to allow dynamic assignment of values, such as 'executorCores' and 'spark.sql.some.key', which could be derived from a function or environment variable at runtime.
To achieve this, you can modify the configure method in the KernelMagics class. Currently, the configure method reads the cell content as a JSON string and expects it to be a dictionary of configuration parameters. To allow dynamic assignment of values, you can parse the cell content as Python code instead of a JSON string. This way, you can use Python expressions, functions, or environment variables to compute the values of the configuration parameters.
Here is a modified version of the configure method that implements this idea:
@magic_arguments()
@cell_magic
@argument(
"-f",
"--force",
type=bool,
default=False,
nargs="?",
const=True,
help="If present, user understands.",
)
@wrap_unexpected_exceptions
@handle_expected_exceptions
@_event
def configure(self, line, cell="", local_ns=None):
try:
# Evaluate the cell content as Python code
dictionary = eval(cell, local_ns)
except Exception as e:
self.ipython_display.send_error(
"Could not evaluate cell content as Python code: {}".format(e)
)
return
args = parse_argstring_or_throw(self.configure, line)
if self.session_started:
if not args.force:
self.ipython_display.send_error(
"A session has already been started. If you intend to recreate the "
"session with new configurations, please include the -f argument."
)
return
else:
self._do_not_call_delete_session("")
self._override_session_settings(dictionary)
self._do_not_call_start_session("")
else:
self._override_session_settings(dictionary)
self.info("")
With this modification, you can use Python expressions in the %%configure cell. For example:
%%configure
{
"executorCores": get_executor_cores(),
"spark.sql.some.key": os.environ["SOME_ENV_VAR"]
}
In this example, get_executor_cores() is a function that computes the number of executor cores, and os.environ["SOME_ENV_VAR"] is an environment variable. These expressions are evaluated when the %%configure cell is executed, and their values are used as the configuration parameters.
Please note that this modification could potentially introduce security risks, as it allows execution of arbitrary Python code. Make sure to use it in a trusted environment.
I hope this helps! If you have any further questions, feel free to ask.
Best, Dosu
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@sunayansaikia %%configure customization is a common feature request. I like the angle of using a function to programmatically fetch variables. Dosu's approach seems reasonable at a glance as long as we maintain backwards compatibility.
FWIW, one-workaround is to write a function to modify the global sparkmagic config.json before starting the kernel