pgcontents
pgcontents copied to clipboard
Create DB tables in advance?
Hi,
I was trying to use pgcontents to store Jupyter notebook code in PostgreSQL.
I set up a blank PostgresSQL database like it said in instructions and specified it in my jupyter_notebook_config.py:
from pgcontents import PostgresContentsManager c = get_config() # noqa c.NotebookApp.contents_manager_class = PostgresContentsManager c.PostgresContentsManager.db_url = 'postgresql://postgres:[email protected]/pgcontents'
Then I brought up the notebook in EKS and tried to see if it would save my Jupyter notebook code.
I got an error like this:
$ kubectl logs jupyter-deployment-f4846c8db-b7t95 -n jupyter Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context self.dialect.do_execute( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) psycopg2.errors.UndefinedTable: relation "pgcontents.users" does not exist LINE 1: INSERT INTO pgcontents.users (id) VALUES ('root') ^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/jupyter-notebook", line 8, in
Do I need to create the users, directories, files tables beforehand? I am confused because the directions say:
Prerequisites: Write access to an empty PostgreSQL database.
Also I was trying to look at the code and see where you actually create the tables, and I couldn't find where you actually do a 'CREATE TABLE' command. @ssanderson Could you please give guidance on this?
Thanks!
Hi @nsshah1288,
The tables used by pgcontents are created by the pgcontents init command noted in the README's installation steps. That command uses alembic to run a migration that should set up the database as necessary.
Thanks @ssanderson for the quick reply!
Right now I have a Dockerfile like this:
FROM ubuntu:latest
WORKDIR /code
RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y build-essential python-dev libpq-dev
RUN apt-get install -y python3.6 python-distribute python3-pip
RUN pip3 -q install pip --upgrade
ADD jupyter_notebook_2_config.py /code
RUN pip3 install jupyter
RUN pip install pgcontents
RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt
RUN mkdir /notebooks
CMD jupyter notebook --allow-root --no-browser --ip 0.0.0.0 --config=jupyter_notebook_2_config.py --port 8080 /notebooks
When I try to run
docker build -t postgres .
I can't connect to the PostgreSQL DB that I have setup in AWS.
Step 10/12 : RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt ---> Running in abffc35920e4 Initializing pgcontents... Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 2345, in _wrap_pool_connect return fn() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 304, in unique_connection return _ConnectionFairy._checkout(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 239, in _do_get return self._create_connection() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 440, in init self.__connect(first_connect_check=True) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 661, in connect pool.logger.debug("Error on connect(): %s", e) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 68, in exit compat.raise( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 178, in raise raise exception File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 656, in __connect connection = pool._invoke_creator(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, **cparams) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 490, in connect return self.dbapi.connect(*cargs, **cparams) File "/usr/local/lib/python3.8/dist-packages/psycopg2/init.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: Connection refused Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.158.188) and accepting TCP/IP connections on port 5432? could not connect to server: Connection refused Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.159.214) and accepting TCP/IP connections on port 5432?
This error makes sense, but I have not yet figured out how to supply temporary credentials in my jupyter_notebook_2_config.py file yet.
When I used s3contents I was able to specify access key, secret key, and session token, and then it connected to s3. Is there a way to do this using pgcontents?
Something like below?
#c.PostgresContentsManager.access_key_id="blah" #c.PostgresContentsManager.secret_access_key="blah" #c.PostgresContentsManager.session_token="blah"
I am guessing I should instead put pgcontents init inside the jupyter_notebook_2_config.py, but I don't see how that will work if I can't get the temporary credentials to work. Any ideas?
@ssanderson do you have any suggestions?
hey @nsshah1288 . You only need to run pgcontents init once. It will create all the tables that will be used. It's usually done manually from the command line.
For specifying your database credentials, you can put those in the jupyter notebook config. There's an example of what that looks like here: https://github.com/quantopian/pgcontents/blob/master/examples/example_jupyter_notebook_config.py.
Hi @ssanderson thanks for the reply.
I ranpgcontents initwith a file database URL of
postgresql://{username}:{password}@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres
I provided the real username and password obviously.
I am able to log into the query editor in AWS for this Aurora PostgreSQL serverless DB with the credentials. However, when I run pgcontents init with the same credentials, I get this error:
sh-4.2$ pgcontents init File Database URL: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres Repeat for confirmation: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres Initializing pgcontents...
About to run schema migrations against supplied database URL. If you have stored data from a previous pgcontents installation, it may not be correctly preserved.
It is HIGHLY recommended that you back up stored data before proceeding.
Proceed? [y/N]: y
Traceback (most recent call last):
File "/usr/local/bin/alembic", line 11, in
(Background on this error at: http://sqlalche.me/e/e3q8)
Traceback (most recent call last):
File "/usr/local/bin/pgcontents", line 201, in
Do you know if pgcontents works with Aurora PostgreSQL Serverless? Thanks
Hi @ssanderson I tried using a normal PostgreSQL DB and pgcontents init worked. Do you know if pgcontents can work with PostgreSQL serverless?
Thanks!
I'm having a similar issue as pl31 created (https://github.com/pl31/heroku-jupyter/issues/38).
- Using pgcontents 0.6.0.
- Created a database (notebooks) in PostgreSQL 11 (PostgreSQL 14 did not work at all)
- Ran
pgcontents initwith the database url pointing to the newly created database:postgresql://<username>:<password>@localhost/notebooks - Initialization completes successfully
- When I log into the database, there are no tables. I do find the
pgcontentsschema, and thealembic_versiontable.