pgcontents icon indicating copy to clipboard operation
pgcontents copied to clipboard

Create DB tables in advance?

Open nsshah1288 opened this issue 5 years ago • 7 comments

Hi,

I was trying to use pgcontents to store Jupyter notebook code in PostgreSQL.

I set up a blank PostgresSQL database like it said in instructions and specified it in my jupyter_notebook_config.py:

from pgcontents import PostgresContentsManager c = get_config() # noqa c.NotebookApp.contents_manager_class = PostgresContentsManager c.PostgresContentsManager.db_url = 'postgresql://postgres:[email protected]/pgcontents'

Then I brought up the notebook in EKS and tried to see if it would save my Jupyter notebook code.

I got an error like this:

$ kubectl logs jupyter-deployment-f4846c8db-b7t95 -n jupyter Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context self.dialect.do_execute( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) psycopg2.errors.UndefinedTable: relation "pgcontents.users" does not exist LINE 1: INSERT INTO pgcontents.users (id) VALUES ('root') ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/jupyter-notebook", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/jupyter_core/application.py", line 270, in launch_instance return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs) File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 663, in launch_instance app.initialize(argv) File "", line 2, in initialize File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 87, in catch_config_error return method(app, *args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/notebook/notebookapp.py", line 1766, in initialize self.init_configurables() File "/usr/local/lib/python3.8/dist-packages/notebook/notebookapp.py", line 1383, in init_configurables self.contents_manager = self.contents_manager_class( File "/usr/local/lib/python3.8/dist-packages/pgcontents/pgmanager.py", line 99, in init super(PostgresContentsManager, self).init(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/pgcontents/managerbase.py", line 66, in init self.ensure_user() File "/usr/local/lib/python3.8/dist-packages/pgcontents/managerbase.py", line 70, in ensure_user ensure_db_user(db, self.user_id) File "/usr/local/lib/python3.8/dist-packages/pgcontents/query.py", line 89, in ensure_db_user db.execute( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1020, in execute return meth(self, multiparams, params) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement ret = self._execute_context( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1323, in _execute_context self.handle_dbapi_exception( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1517, in handle_dbapi_exception util.raise( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 178, in raise raise exception File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context self.dialect.do_execute( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "pgcontents.users" does not exist LINE 1: INSERT INTO pgcontents.users (id) VALUES ('root')

Do I need to create the users, directories, files tables beforehand? I am confused because the directions say:

Prerequisites: Write access to an empty PostgreSQL database.

Also I was trying to look at the code and see where you actually create the tables, and I couldn't find where you actually do a 'CREATE TABLE' command. @ssanderson Could you please give guidance on this?

Thanks!

nsshah1288 avatar May 26 '20 18:05 nsshah1288

Hi @nsshah1288,

The tables used by pgcontents are created by the pgcontents init command noted in the README's installation steps. That command uses alembic to run a migration that should set up the database as necessary.

ssanderson avatar May 26 '20 19:05 ssanderson

Thanks @ssanderson for the quick reply!

Right now I have a Dockerfile like this:

FROM ubuntu:latest
WORKDIR /code
RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y build-essential python-dev libpq-dev
RUN apt-get install -y python3.6 python-distribute python3-pip
RUN pip3 -q install pip --upgrade
ADD jupyter_notebook_2_config.py /code
RUN pip3 install jupyter
RUN pip install pgcontents
RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt
RUN mkdir /notebooks
CMD jupyter notebook --allow-root --no-browser --ip 0.0.0.0 --config=jupyter_notebook_2_config.py --port 8080 /notebooks

When I try to run docker build -t postgres .

I can't connect to the PostgreSQL DB that I have setup in AWS.

Step 10/12 : RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt ---> Running in abffc35920e4 Initializing pgcontents... Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 2345, in _wrap_pool_connect return fn() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 304, in unique_connection return _ConnectionFairy._checkout(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 239, in _do_get return self._create_connection() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 440, in init self.__connect(first_connect_check=True) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 661, in connect pool.logger.debug("Error on connect(): %s", e) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 68, in exit compat.raise( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 178, in raise raise exception File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 656, in __connect connection = pool._invoke_creator(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, **cparams) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 490, in connect return self.dbapi.connect(*cargs, **cparams) File "/usr/local/lib/python3.8/dist-packages/psycopg2/init.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: Connection refused Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.158.188) and accepting TCP/IP connections on port 5432? could not connect to server: Connection refused Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.159.214) and accepting TCP/IP connections on port 5432?

This error makes sense, but I have not yet figured out how to supply temporary credentials in my jupyter_notebook_2_config.py file yet.

When I used s3contents I was able to specify access key, secret key, and session token, and then it connected to s3. Is there a way to do this using pgcontents?

Something like below?

#c.PostgresContentsManager.access_key_id="blah" #c.PostgresContentsManager.secret_access_key="blah" #c.PostgresContentsManager.session_token="blah"

I am guessing I should instead put pgcontents init inside the jupyter_notebook_2_config.py, but I don't see how that will work if I can't get the temporary credentials to work. Any ideas?

nsshah1288 avatar May 27 '20 18:05 nsshah1288

@ssanderson do you have any suggestions?

nsshah1288 avatar May 29 '20 14:05 nsshah1288

hey @nsshah1288 . You only need to run pgcontents init once. It will create all the tables that will be used. It's usually done manually from the command line.

For specifying your database credentials, you can put those in the jupyter notebook config. There's an example of what that looks like here: https://github.com/quantopian/pgcontents/blob/master/examples/example_jupyter_notebook_config.py.

ssanderson avatar May 29 '20 21:05 ssanderson

Hi @ssanderson thanks for the reply.

I ranpgcontents initwith a file database URL of postgresql://{username}:{password}@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres

I provided the real username and password obviously.

I am able to log into the query editor in AWS for this Aurora PostgreSQL serverless DB with the credentials. However, when I run pgcontents init with the same credentials, I get this error:

sh-4.2$ pgcontents init File Database URL: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres Repeat for confirmation: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres Initializing pgcontents...

About to run schema migrations against supplied database URL. If you have stored data from a previous pgcontents installation, it may not be correctly preserved.

It is HIGHLY recommended that you back up stored data before proceeding.

Proceed? [y/N]: y Traceback (most recent call last): File "/usr/local/bin/alembic", line 11, in load_entry_point('alembic==1.4.2', 'console_scripts', 'alembic')() File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 577, in main CommandLine(prog=prog).main(argv=argv) File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 571, in main self.run_cmd(cfg, options) File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 551, in run_cmd **dict((k, getattr(options, k, None)) for k in kwarg) File "/usr/local/lib/python2.7/site-packages/alembic/command.py", line 298, in upgrade script.run_env() File "/usr/local/lib/python2.7/site-packages/alembic/script/base.py", line 489, in run_env util.load_python_file(self.dir, "env.py") File "/usr/local/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file module = load_module_py(module_id, path) File "/usr/local/lib/python2.7/site-packages/alembic/util/compat.py", line 250, in load_module_py mod = imp.load_source(module_id, path, fp) File "/usr/local/lib/python2.7/site-packages/pgcontents/alembic/env.py", line 83, in run_migrations_online() File "/usr/local/lib/python2.7/site-packages/pgcontents/alembic/env.py", line 65, in run_migrations_online connection = engine.connect() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2272, in connect return self._connection_cls(self, **kwargs) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 104, in init else engine.raw_connection() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2379, in raw_connection self.pool.unique_connection, _connection File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2349, in _wrap_pool_connect e, dialect, self File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1591, in handle_dbapi_exception_noconnection sqlalchemy_exception, with_traceback=exc_info[2], from=e File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2345, in _wrap_pool_connect return fn() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection return _ConnectionFairy._checkout(self) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/impl.py", line 239, in _do_get return self._create_connection() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 440, in init self.__connect(first_connect_check=True) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 661, in __connect pool.logger.debug("Error on connect(): %s", e) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in exit exc_value, with_traceback=exc_tb, File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 656, in __connect connection = pool._invoke_creator(self) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, **cparams) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 490, in connect return self.dbapi.connect(*cargs, **cparams) File "/usr/local/lib64/python2.7/site-packages/psycopg2/init.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user "master" FATAL: password authentication failed for user "master"

(Background on this error at: http://sqlalche.me/e/e3q8) Traceback (most recent call last): File "/usr/local/bin/pgcontents", line 201, in main() File "/usr/local/lib/python2.7/site-packages/click/core.py", line 829, in call return self.main(*args, **kwargs) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/usr/local/bin/pgcontents", line 98, in init upgrade(db_url, revision) File "/usr/local/lib/python2.7/site-packages/pgcontents/utils/migrate.py", line 40, in upgrade ['alembic', '-c', alembic_ini, 'upgrade', revision] File "/usr/lib64/python2.7/subprocess.py", line 190, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['alembic', '-c', '/tmp/tmpf740BC/temp_alembic.ini', 'upgrade', u'head']' returned non-zero exit status 1

Do you know if pgcontents works with Aurora PostgreSQL Serverless? Thanks

nsshah1288 avatar Jun 02 '20 15:06 nsshah1288

Hi @ssanderson I tried using a normal PostgreSQL DB and pgcontents init worked. Do you know if pgcontents can work with PostgreSQL serverless?

Thanks!

nsshah1288 avatar Jun 05 '20 19:06 nsshah1288

I'm having a similar issue as pl31 created (https://github.com/pl31/heroku-jupyter/issues/38).

  • Using pgcontents 0.6.0.
  • Created a database (notebooks) in PostgreSQL 11 (PostgreSQL 14 did not work at all)
  • Ran pgcontents init with the database url pointing to the newly created database: postgresql://<username>:<password>@localhost/notebooks
  • Initialization completes successfully
  • When I log into the database, there are no tables. I do find the pgcontents schema, and the alembic_version table.

jorisgillis avatar Oct 13 '21 09:10 jorisgillis