pyiron_base
pyiron_base copied to clipboard
Config is confusing
In my .pyiron
config I have the following lines:
[DEFAULT]
RESOURCE_PATHS = ${CONDA_PREFIX}/share/pyiron, /Users/janssen/pyiron/resources
PROJECT_CHECK_ENABLED = False
DISABLE_DATABASE = True
Still pr.state.settings.configuration
returns:
{'user': 'pyiron',
'resource_paths': ['/Users/janssen/pyiron/projects/2021/2021-12-09-pyiron-no-db/${CONDA_PREFIX}/share/pyiron',
'/Users/janssen/pyiron/resources',
'/Users/janssen/mambaforge/share/pyiron'],
'project_paths': ['/Users/janssen/pyiron/projects/'],
'connection_timeout': 60,
'sql_connection_string': None,
'sql_table_name': 'jobs_pyiron',
'sql_view_connection_string': None,
'sql_view_table_name': None,
'sql_view_user': None,
'sql_view_user_key': None,
'sql_file': '/Users/janssen/pyiron.db',
'sql_host': None,
'sql_type': 'SQLite',
'sql_user_key': None,
'sql_database': None,
'project_check_enabled': 0,
'disable_database': 1}
The sqlfile
and project_paths
is not only confusing for me but also confuses pyiron https://github.com/pyiron/pyiron_base/pull/576
@niklassiemer and @liamhuber Can you take a look?
I don't think it's a problem that you say "please don't use these features" but your config file still has parameters available for utilizing those features in case you re-activate this feature. For instance, with the current implementation, putting disable_database = True
in your config and then later using pr.switch_to_local_database()
opens up a connection to ~/pyiron.db
. With the proposed removal of the default, every time you called pr.switch_to_local_database()
, you'd be creating a new database file. The existing behaviour seems better, IMO. To that end, I would just close #578.
In the same way, I don't mind having a default value for project_paths
-- but here, if project_check = False
then the behaviour of DatabaseManager.top_path
should indeed change. To this end I have no objection to #576, it definitely makes things more sensible. But if we're going to touch project path checking at all, can we please just get rid of it instead? My understanding from the last time we all talked about it was that absolutely nobody likes it or wants it, so if we're going to put effort in then let's use that effort to kill it entirely.
Since you do not define any project_paths in the config that should be an empty list...
Aha aha aha! My comment over on the issue is also a little off then -- I thought the user path was in
Settings.default_configuration
, but it's not! Indeed, then I am also confused why it's not an empty list. (The database default is still acting sensibly though.)
Copied from the PR. Indeed, it is confusing that project paths gets filled! All the better to just kill it with fire 😉
In my
.pyiron
config I have the following lines:[DEFAULT] RESOURCE_PATHS = ${CONDA_PREFIX}/share/pyiron, /Users/janssen/pyiron/resources PROJECT_CHECK_ENABLED = False DISABLE_DATABASE = True
Is this the full .pyiron
file? So far I do not get it - will have another look, tomorrow.
Is this the full
.pyiron
file? So far I do not get it - will have another look, tomorrow.
Yes, that is all.
I don't think it's a problem that you say "please don't use these features" but your config file still has parameters available for utilizing those features in case you re-activate this feature. For instance, with the current implementation, putting
disable_database = True
in your config and then later usingpr.switch_to_local_database()
opens up a connection to~/pyiron.db
. With the proposed removal of the default, every time you calledpr.switch_to_local_database()
, you'd be creating a new database file. The existing behaviour seems better, IMO. To that end, I would just close #578.
I can see the advantage of being able to switch from no database to a local database, but at least for the current case I just want to support HPC clusters which do not allow databases. So I would even be fine with raising an error when the user tries to switch.
I can see the advantage of being able to switch from no database to a local database, but at least for the current case I just want to support HPC clusters which do not allow databases. So I would even be fine with raising an error when the user tries to switch.
This is tangential. The SQLite default can be there without stopping database-free usage. I'm as surprised as Niklas that project_paths
gets populated since the default is an empty list. But I don't see why we need to change anything with the database defaults. If disable_database = True
, then the SQL settings will simply not be exploited -- that doesn't mean they need to get deleted. And in fact, deleting them under a special case would also require editing the spec with a "except if X=Y, then Y,W,V are Foo" clause.
I cannot reproduce this behavior. I tried with the pyiron_base/master and got an empty list for project_paths (see below)! @jan-janssen do you have an environment variable starting with 'PYIRON' defined?
import sys
import os
config = """[DEFAULT]
RESOURCE_PATHS = ${CONDA_PREFIX}/share/pyiron, /Users/janssen/pyiron/resources
PROJECT_CHECK_ENABLED = False
DISABLE_DATABASE = True
"""
test_config_file = os.path.expanduser('~/test_pyiron_config')
with open(test_config_file, 'w') as f:
f.write(config)
os.environ['PYIRONCONFIG'] = test_config_file
from pyiron_base import Settings
s = Settings()
print(s.configuration)
>>>
{'user': 'pyiron',
'resource_paths': ['/mnt/c/Users/Siemer/pyiron/projects/${CONDA_PREFIX}/share/pyiron',
'/Users/janssen/pyiron/resources',
'/home/nsiemer/anaconda3/envs/pyiron_git/share/pyiron'],
'project_paths': [],
'connection_timeout': 60,
'sql_connection_string': None,
'sql_table_name': 'jobs_pyiron',
'sql_view_connection_string': None,
'sql_view_table_name': None,
'sql_view_user': None,
'sql_view_user_key': None,
'sql_file': '/home/nsiemer/pyiron.db',
'sql_host': None,
'sql_type': 'SQLite',
'sql_user_key': None,
'sql_database': None,
'project_check_enabled': 0,
'disable_database': 1}
@niklassiemer Yes, you are right - most likely I still had the project path in there and assumed it would be overwritten with project_check_enabled
. But the issue should be solved with https://github.com/pyiron/pyiron_base/pull/576 which respects the order, even if the path is defined.
I am opposed to merge #576 since this would directly affect all users on cmti. With the default project_check_enabled=False and #576 and project_path defined in the config the old jobs are not found any more! Indeed, I do not see any reason why one previously would define project_path in the config if not project_check_enabled. This option was just ignored. I do not see any reason why changing project_check_enabled should change how a project is stored in the database.
But that is the whole purpose of project_check_enabled
enabled, is not it?
The old behavior was that when project_check_enabled
is false the project_path
variable is ignored and all I am asking for is to revert this change.
And the second part is that the sql_*
keys are confusing when I did not configure them. In particular when users try to run pyiron without database. That is why I removed them in https://github.com/pyiron/pyiron_base/pull/578
And the second part is that the
sql_*
keys are confusing when I did not configure them. In particular when users try to run pyiron without database. That is why I removed them in https://github.com/pyiron/pyiron_base/pull/578
I strongly disagree. They are there even though you didn't configure them because they're the default values-- that is not so confusing, or? The current behavior is less confusing because it starts with a clearly defined set of defaults and then layers on top of them a single set of new values, and where those values come from is clearly described. Your proposed change means that the description of the behavior requires and extra if clause as I explained above. I would measure confusing and complex by how long it takes to describe the behavior, in which case the new pr is definitely more confusing.-
Regarding project path, I'll let you and Niklas sort it out, but as a principle (a) I don't like changes that break existing user systems, and (b) if we're going to touch this part of the code again, could we please just get rid of this check entirely? Otherwise I'm not particularly married to one behavior or the other.
I strongly disagree. They are there even though you didn't configure them because they're the default values-- that is not so confusing, or? The current behavior is less confusing because it starts with a clearly defined set of defaults and then layers on top of them a single set of new values, and where those values come from is clearly described. Your proposed change means that the description of the behavior requires and extra if clause as I explained above. I would measure confusing and complex by how long it takes to describe the behavior, in which case the new pr is definitely more confusing.-
It is confusing for the user vs. confusing for the developer. A user who wants to run pyiron without a database is going to be confused by the sql options even if these are the defaults, because he intentionally overwrites this behavior by setting disable_database = True
.
Regarding project path, I'll let you and Niklas sort it out, but as a principle (a) I don't like changes that break existing user systems, and (b) if we're going to touch this part of the code again, could we please just get rid of this check entirely? Otherwise I'm not particularly married to one behavior or the other.
The current option project_check_enabled
has no effect which I presume is a mistake of the previous changes. (a) So by reverting the default to true
no behavior for existing users is affected and the old behavior of project_check_enabled=False
overwriting any project_paths
(because when we do not check them we do not need them) is restored. (b) when we modify the code in future to remove the project check completely then we can set this option to false
for debugging - before changing the default for everybody.
It is confusing for the user
I'm sorry Jan, but I really struggle to see what's confusing. The defaults are there even if you don't specify values for them -- that's just what defaults are. Setting disable_database = True
should stop the database from being used, but I don't see any reason that throwing that flag should go in and meddle with other config settings -- in fact, that seems like super unpredictable behaviour to me. I really can't get behind #578.
Regarding project_check_enabled
, you are spot on that (my) earlier changes introduced a bug. I'm happy with whatever solution you and @niklassiemer find mutually satisfactory. Your plan for patching over it now so we can later remove it (using false as a test) also sounds good.
See our minutes.
See our minutes.
Nice! Very detailed 😄
I would push back on one point. But since #578, which used to do exactly this point now does something different (and excellent), maybe this point is already outdated and the rant is moot!
even more conclusion: if database_disabled is set, all the other options should not be shown or deleted from the configuration dict.
Ok, rant/: we must not delete a chunk of a user's config based on a flag in another part of their config saying that those config values are not used. Abstract example:
Suppose I have a config file like:
FOO = 42
BAR = "forty-two"
USE_FOO = False
Because I want to be using the bar-like guy by default when I start up my environment. Now I do something like
pr.activate_foo()
> ValueError: None type cannot be added to int
And I'm confused, so I look at pr.state.settings.config.foo
and sure enough it's None
, but I can see plain as day that I set it to 42
in my config file! This is terrible and should not happen.
A very real example of this is the following: I want, by default, to run pyiron without a database, but when I activate the database I want it to reference database config values that I earlier set. If we delete those values this is impossible!
Storing unused config values is totally normal. In a gui setting these would typically be greyed-out, or maybe hidden behind a drop-down menu -- but they would certainly exist. So I don't object to hiding them, although for direct code users who are poking around in pr.state.settings.config
to start with I don't see a strong need. But we must not delete them.
Again, apologies if this is a response to the outdated minutes. #578 used to silently kill SQL settings (which is a hill I was willing to die on), but now addresses other points from the minutes about guaranteeing that disable_database
/project_check_enabled
/project_paths
values are all mutually compatible, which I really like.
We sort of addressed some of this in the discussion and I'm thinking in this direction of having multiple sections in the config file, like
[DEFAULT]
TYPE = Postgres
[Database.Postgres]
HOST = ...
NAME = ...
[Database.SQLite]
SQL_FILE = ...
[Database.FileTable]
# probably empty, maybe just allow TYPE=Disabled and don't have a separate entry for it
this could then be mapping in the settings dictionary accordingly as sub-groups and the type flag would just pick out one that is active.
Yeah, that solution sounds great. No overwriting separate sections involved, just ignoring them. We need to provide somewhere for users to view the inactive settings inside a notebook environment, but the default view could probably even be made to hide these unused sub-dicts.