logica icon indicating copy to clipboard operation
logica copied to clipboard

Connecting to a SQLite/PostgreSQL database in a script

Open EricWay1024 opened this issue 3 years ago • 14 comments

Logica is a wonderful alternative to SQL, and I would like to connect to SQLite/PostgreSQL database in a l script for data query. However currently there seems to be limited support for this.

Connection to a SQLite database file seems to have been already implemented but has gone undocumented. We could use the @AttachDatabase annotation:

@Engine("sqlite");
@AttachDatabase("db", "databaseFile.db");
Table(..r) :- `(db.tableName)`(..r);

and then run this script in local command line. But this usage does not appear anywhere in the tutorial and examples.

We have an example of connecting to a PosgreSQL database in ipynb files with the help of sqlalchemy library:

# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine
import pandas
engine = create_engine('postgresql+psycopg2://logica:[email protected]', pool_recycle=3600);
connection = engine.connect();
colab_logica.SetDbConnection(connection)

which is great, but I'm not sure if this is possible in a script program.

EricWay1024 avatar Apr 16 '21 13:04 EricWay1024

Hi, I have a similar use case. I would like to run logica over sqlite3(specifying a path to the db file) in a simple Python script outside of IPython.

RAbraham avatar Oct 19 '21 11:10 RAbraham

Hi, disregard if you are busy, but if you could give some pointers on how to go about doing this, that would be awesome. I'd like to use logica outside the notebook context in my regular python projects but wasn't sure about how to go about doing this. if it's not cleanly cut out, you can even say go to this py file and cut this out and use it with this line of code :). I'll figure something out.

RAbraham avatar Feb 06 '22 12:02 RAbraham

Hi @RAbraham ! I believe the functionality is there, we just need to test and document it better.

I've added an example of running a predicate on SQLite engine: https://github.com/EvgSkv/logica/blob/main/examples/more/call_sqlite.py

Let me know if it covers your usecase.

If you do have time, adding a similar (tested 😄) example for Postgres would be great.

Thank you!

EvgSkv avatar Feb 07 '22 19:02 EvgSkv

Thanks!

  • Is it possible to connect to an existing sqlite database? Is there an annotation I can use, or dynamically pass the connection string by environment variables or even pass the connection string or object? e.g. below.
db=sqla.create_engine('sqlite:////home/stephen/db1')
  • Same question for postgres? how do I connect my script to a postgres database with creds without using colab_logica.SetDbConnection(connection).

RAbraham avatar Feb 08 '22 11:02 RAbraham

@AttachDatabase annotation lets us connect to a database. We do need better docs, meanwhile here is an example of script reading tables Parent and Person from heritage.db file and writing table Sibling there.

@Engine("sqlite");
@AttachDatabase("db", "heritage.db");
@Dataset("db");

@Ground(Sibling);
Sibling(x, y) distinct :- db.Parent(z, x), db.Parent(z, y);

PersonInfo(person:,
           children_count:,
           sibling_count:) :-
  db.Person(person),
  sibling_count += (1 :- Sibling(x)),
  children_count += (1 :- Parent(person, x));

To pass a custom database file name use flags. For instance the script above would look like:

@Engine("sqlite");

@DefineFlag("database_file");

@AttachDatabase("db", "${database_file}");
@Dataset("db");

@Ground(Sibling);
Sibling(x, y) distinct :- db.Parent(z, x), db.Parent(z, y);

PersonInfo(person:,
           children_count:,
           sibling_count:) :-
  db.Person(person),
  sibling_count += (1 :- Sibling(x)),
  children_count += (1 :- Parent(person, x));

And when calling use user_flags parameter:

logica_lib.RunPredicateToPandas(
    'sibling_script.l',
    'PersonInfo',
    user_flags={'database_file': 'my_custom_file.db')

At the moment it's not possible to pass a custom connection. If you could extend the RunPredicateToPandas with that ability it would be great! You can see that here we call a function that does accept connection argument. So if you would add a connection argument to RunPredicateToPandas and pass it there it should work. For now testing it on your machine would be sufficient.

Let me know if you have further questions.

EvgSkv avatar Feb 09 '22 07:02 EvgSkv

Great, I'll attempt it now and let you know. Thank you very much!

RAbraham avatar Feb 09 '22 11:02 RAbraham

I'm trying to test this. How do I populate rows in the database from logica? According to the tutorial, I hacked this but I'm missing a concept or two

  • Does @Dataset mean a table or a schema?
  • I'm unsure why the same variabledb is used both in AttachDataset and DataSet. Are they connected, as in, do they have to have the same value db?
@Engine("sqlite");

@DefineFlag("database_file");

@AttachDatabase("db", "${database_file}");
@Dataset("db");

db.Parent("A", "B");
db.Parent("B", "C");
db.Parent("C", "D");
db.Parent("B", "E");
db.Parent("A", "F");
db.Parent("A", "G");
db.Parent("G", "H");

@Ground(Sibling);
Sibling(x, y) distinct :- db.Parent(z, x), db.Parent(z, y);

PersonInfo(person:,
           children_count:,
           sibling_count:) :-
  db.Person(person),
  sibling_count += (1 :- Sibling(x)),
  children_count += (1 :- Parent(person, x));

Error:

Traceback (most recent call last):
  File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 1681, in execute
    cur.execute(*args, **kwargs)
sqlite3.OperationalError: no such table: db.Person

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/rajiv/Documents/dev/python/mercylog/logica_trial.py", line 28, in <module>
    result = logica_lib.RunPredicateToPandas(
  File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/logica/common/logica_lib.py", line 149, in RunPredicateToPandas
    return RunQueryPandas(sql, engine)
  File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/logica/common/logica_lib.py", line 137, in RunQueryPandas
    return pandas.read_sql(statements[-1], connection)
  File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 483, in read_sql
    return pandas_sql.read_query(
  File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 1727, in read_query
    cursor = self.execute(*args)
  File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 1693, in execute
    raise ex from exc
pandas.io.sql.DatabaseError: Execution failed on sql '

-- Interacting with table db.Sibling

SELECT
  db_Person.col0 AS person,
  (SELECT
  SUM(MagicalEntangle(1, x_9.value)) AS logica_value
FROM
  Parent, JSON_EACH(JSON_ARRAY(0)) as x_9
WHERE
  (Parent.col0 = db_Person.col0)) AS children_count,
  (SELECT
  SUM(MagicalEntangle(1, x_12.value)) AS logica_value
FROM
  db.Sibling AS Sibling, JSON_EACH(JSON_ARRAY(0)) as x_12) AS sibling_count
FROM
  db.Person AS db_Person': no such table: db.Person

RAbraham avatar Feb 10 '22 10:02 RAbraham

I got this to work(removed the db dataset annotation for now and any reference to db). I also pre populated the database tables person and parent. But still curious about the above question. I'll try postgres next.

@Engine("sqlite");

@DefineFlag("database_file");

@AttachDatabase("db", "${database_file}");


@Ground(Sibling);
Sibling(x, y) distinct :- Parent(z, x), Parent(z, y);

PersonInfo(person:,
           children_count:,
           sibling_count:) :-
  Person(person),
  sibling_count += (1 :- Sibling(x)),
  children_count += (1 :- Parent(person, x));

RAbraham avatar Feb 10 '22 11:02 RAbraham

Great questions!

  • Does @Dataset mean a table or a schema? @Dataset specifies schema to be used as a default for @Ground, which connects predicates to concrete tables. "Dataset" is essentially BigQuery's term for schema. I'm sure there are nuanced differences which made them call it different.

If you write @Ground(P, "mydb.my_table") predicate P is to be written to "mydb.my_table". And if you write @Ground(P) then P is to be written to <default dataset>.P. And if you don't specify dataset then it's assumed to be equal to logica_test.

  • I'm unsure why the same variabledb is used both in AttachDataset and DataSet. Are they connected, as in, do they have to have the same value db?

You can attach more then one database with @AttachDatabase command. And there can be only one default dataset. Here we pass the same value because we are attaching this database as db and specifying that this is where the tables are to be written by default. If we didn't specify Dataset then the tables would be written to a temporary in-ram database attached to logica_test alias.

Your original program complains that db.Person is undefined, which is true.

Let me know if you have further questions!

EvgSkv avatar Feb 11 '22 18:02 EvgSkv

re: postgres, I did what you said and it works :+1:

from logica.common import logica_lib
from sqlalchemy import create_engine

engine = create_engine('postgresql+psycopg2://postgres:somepwd@localhost', pool_recycle=3600)
connection = engine.connect()
result = logica_lib.RunPredicateToPandas(
    '/home/rajiv/Documents/dev/python/mercylog/logica/primes_postgresql.l',
    'Prime',
    connection=connection)

print(result)

It was a simple change of just adding connection to RunPredicateToPandas

def RunPredicateToPandas(filename, predicate,
                         user_flags=None, import_root=None, connection=None):
  p = GetProgramOrExit(filename, user_flags=user_flags,
                       import_root=import_root)
  sql = p.FormattedPredicateSql(predicate)
  engine = p.annotations.Engine()
  return RunQueryPandas(sql, engine, connection)

The one thing I found odd is that I had to install psycopg2 with pip in my virtual environment. I just skimmed through my notebook of logica on postgres and I don't see me installing it. Small thing but thought I'd share.

RAbraham avatar Feb 12 '22 11:02 RAbraham

  • @Dataset specifies schema to be used as a default for @Ground, which connects predicates to concrete tables. ..

wow, Ground is elegant :). I think I may have a different mental map of databases so just wanted to clarify that first.

In postgres for e.g., I think of referring to a table as db.schema.table where db is another database( https://chartio.com/resources/tutorials/how-to-list-databases-and-tables-in-postgresql-using-psql/#listing-databases) within the same database server. I see a schema as something like a namespace.

In the e.g. above from chartio, the database is sales, the schema is public and the table is leads. If you don't specify a table, the default schema is public but I could have another schema with the CREATE SCHEMA command.

I think SQLite has no schema concept and maybe what you are referring to maps more to SQLite(and BigQuery?)

RAbraham avatar Feb 12 '22 11:02 RAbraham

@RAbraham , oh I see. Then I think I didn't use the word "schema" correctly in my last reply.

Yes, it appears that in SQLite schema and database are synonyms while in PostgreSQL they are not. But I am not 100% sure :-)

Logica compiler builds SQL as a string. So the @Dataset just specifies the prefix that through dot gets appended to the table name corresponding to the predicate. So it could be called differently depending on the terminology of the backend engine used.

Hope this helps and at least it's clear how to use it :-) Let me know!

Great to hear it works with PostgreSQL. I think CoLab has psycopg2, so we didn't have to install it there. I think it's OK for the user to install it if they want to use psql connection. Logica on it's own has no dependencies, which helps with maintenance.

If you happen to have a chance to send a pull request with your change enabling PSQL and your example of using it (which could have a comment on what needs to be installed) in the examples/more folder, I'd appreciate it!

Thank you!

EvgSkv avatar Feb 12 '22 19:02 EvgSkv

re: prefix append, nice. reminds me of the Law of Demeter but for network paths :)

PR: https://github.com/EvgSkv/logica/pull/204 lmk

RAbraham avatar Feb 13 '22 13:02 RAbraham

Law of Demeter is an interesting analogy. :-) Merged the PR. Thank you!

EvgSkv avatar Feb 14 '22 07:02 EvgSkv