logica
logica copied to clipboard
Connecting to a SQLite/PostgreSQL database in a script
Logica is a wonderful alternative to SQL, and I would like to connect to SQLite/PostgreSQL database in a l
script for data query. However currently there seems to be limited support for this.
Connection to a SQLite database file seems to have been already implemented but has gone undocumented. We could use the @AttachDatabase
annotation:
@Engine("sqlite");
@AttachDatabase("db", "databaseFile.db");
Table(..r) :- `(db.tableName)`(..r);
and then run this script in local command line. But this usage does not appear anywhere in the tutorial and examples.
We have an example of connecting to a PosgreSQL database in ipynb
files with the help of sqlalchemy
library:
# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine
import pandas
engine = create_engine('postgresql+psycopg2://logica:[email protected]', pool_recycle=3600);
connection = engine.connect();
colab_logica.SetDbConnection(connection)
which is great, but I'm not sure if this is possible in a script program.
Hi, I have a similar use case. I would like to run logica over sqlite3(specifying a path to the db file) in a simple Python script outside of IPython.
Hi, disregard if you are busy, but if you could give some pointers on how to go about doing this, that would be awesome. I'd like to use logica outside the notebook context in my regular python projects but wasn't sure about how to go about doing this. if it's not cleanly cut out, you can even say go to this py file and cut this out and use it with this line of code :). I'll figure something out.
Hi @RAbraham ! I believe the functionality is there, we just need to test and document it better.
I've added an example of running a predicate on SQLite engine: https://github.com/EvgSkv/logica/blob/main/examples/more/call_sqlite.py
Let me know if it covers your usecase.
If you do have time, adding a similar (tested 😄) example for Postgres would be great.
Thank you!
Thanks!
- Is it possible to connect to an existing sqlite database? Is there an annotation I can use, or dynamically pass the connection string by environment variables or even pass the connection string or object? e.g. below.
db=sqla.create_engine('sqlite:////home/stephen/db1')
- Same question for postgres? how do I connect my script to a postgres database with creds without using
colab_logica.SetDbConnection(connection)
.
@AttachDatabase
annotation lets us connect to a database. We do need better docs, meanwhile here is an example of script reading tables Parent
and Person
from heritage.db
file and writing table Sibling
there.
@Engine("sqlite");
@AttachDatabase("db", "heritage.db");
@Dataset("db");
@Ground(Sibling);
Sibling(x, y) distinct :- db.Parent(z, x), db.Parent(z, y);
PersonInfo(person:,
children_count:,
sibling_count:) :-
db.Person(person),
sibling_count += (1 :- Sibling(x)),
children_count += (1 :- Parent(person, x));
To pass a custom database file name use flags. For instance the script above would look like:
@Engine("sqlite");
@DefineFlag("database_file");
@AttachDatabase("db", "${database_file}");
@Dataset("db");
@Ground(Sibling);
Sibling(x, y) distinct :- db.Parent(z, x), db.Parent(z, y);
PersonInfo(person:,
children_count:,
sibling_count:) :-
db.Person(person),
sibling_count += (1 :- Sibling(x)),
children_count += (1 :- Parent(person, x));
And when calling use user_flags
parameter:
logica_lib.RunPredicateToPandas(
'sibling_script.l',
'PersonInfo',
user_flags={'database_file': 'my_custom_file.db')
At the moment it's not possible to pass a custom connection.
If you could extend the RunPredicateToPandas
with that ability it would be great!
You can see that here we call a function that does accept connection argument. So if you would add a connection argument to RunPredicateToPandas
and pass it there it should work. For now testing it on your machine would be sufficient.
Let me know if you have further questions.
Great, I'll attempt it now and let you know. Thank you very much!
I'm trying to test this. How do I populate rows in the database from logica? According to the tutorial, I hacked this but I'm missing a concept or two
- Does
@Dataset
mean a table or a schema? - I'm unsure why the same variable
db
is used both inAttachDataset
andDataSet
. Are they connected, as in, do they have to have the same valuedb
?
@Engine("sqlite");
@DefineFlag("database_file");
@AttachDatabase("db", "${database_file}");
@Dataset("db");
db.Parent("A", "B");
db.Parent("B", "C");
db.Parent("C", "D");
db.Parent("B", "E");
db.Parent("A", "F");
db.Parent("A", "G");
db.Parent("G", "H");
@Ground(Sibling);
Sibling(x, y) distinct :- db.Parent(z, x), db.Parent(z, y);
PersonInfo(person:,
children_count:,
sibling_count:) :-
db.Person(person),
sibling_count += (1 :- Sibling(x)),
children_count += (1 :- Parent(person, x));
Error:
Traceback (most recent call last):
File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 1681, in execute
cur.execute(*args, **kwargs)
sqlite3.OperationalError: no such table: db.Person
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/rajiv/Documents/dev/python/mercylog/logica_trial.py", line 28, in <module>
result = logica_lib.RunPredicateToPandas(
File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/logica/common/logica_lib.py", line 149, in RunPredicateToPandas
return RunQueryPandas(sql, engine)
File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/logica/common/logica_lib.py", line 137, in RunQueryPandas
return pandas.read_sql(statements[-1], connection)
File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 483, in read_sql
return pandas_sql.read_query(
File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 1727, in read_query
cursor = self.execute(*args)
File "/home/rajiv/Documents/dev/python/mercylog/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 1693, in execute
raise ex from exc
pandas.io.sql.DatabaseError: Execution failed on sql '
-- Interacting with table db.Sibling
SELECT
db_Person.col0 AS person,
(SELECT
SUM(MagicalEntangle(1, x_9.value)) AS logica_value
FROM
Parent, JSON_EACH(JSON_ARRAY(0)) as x_9
WHERE
(Parent.col0 = db_Person.col0)) AS children_count,
(SELECT
SUM(MagicalEntangle(1, x_12.value)) AS logica_value
FROM
db.Sibling AS Sibling, JSON_EACH(JSON_ARRAY(0)) as x_12) AS sibling_count
FROM
db.Person AS db_Person': no such table: db.Person
I got this to work(removed the db
dataset annotation for now and any reference to db
). I also pre populated the database tables person
and parent
. But still curious about the above question. I'll try postgres next.
@Engine("sqlite");
@DefineFlag("database_file");
@AttachDatabase("db", "${database_file}");
@Ground(Sibling);
Sibling(x, y) distinct :- Parent(z, x), Parent(z, y);
PersonInfo(person:,
children_count:,
sibling_count:) :-
Person(person),
sibling_count += (1 :- Sibling(x)),
children_count += (1 :- Parent(person, x));
Great questions!
- Does @Dataset mean a table or a schema?
@Dataset
specifies schema to be used as a default for@Ground
, which connects predicates to concrete tables. "Dataset" is essentially BigQuery's term for schema. I'm sure there are nuanced differences which made them call it different.
If you write @Ground(P, "mydb.my_table")
predicate P is to be written to "mydb.my_table".
And if you write @Ground(P)
then P is to be written to <default dataset>.P
. And if you don't specify dataset then it's assumed to be equal to logica_test
.
- I'm unsure why the same variabledb is used both in AttachDataset and DataSet. Are they connected, as in, do they have to have the same value db?
You can attach more then one database with @AttachDatabase command. And there can be only one default dataset. Here we pass the same value because we are attaching this database as db
and specifying that this is where the tables are to be written by default.
If we didn't specify Dataset then the tables would be written to a temporary in-ram database attached to logica_test
alias.
Your original program complains that db.Person is undefined, which is true.
Let me know if you have further questions!
re: postgres, I did what you said and it works :+1:
from logica.common import logica_lib
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://postgres:somepwd@localhost', pool_recycle=3600)
connection = engine.connect()
result = logica_lib.RunPredicateToPandas(
'/home/rajiv/Documents/dev/python/mercylog/logica/primes_postgresql.l',
'Prime',
connection=connection)
print(result)
It was a simple change of just adding connection
to RunPredicateToPandas
def RunPredicateToPandas(filename, predicate,
user_flags=None, import_root=None, connection=None):
p = GetProgramOrExit(filename, user_flags=user_flags,
import_root=import_root)
sql = p.FormattedPredicateSql(predicate)
engine = p.annotations.Engine()
return RunQueryPandas(sql, engine, connection)
The one thing I found odd is that I had to install psycopg2
with pip in my virtual environment. I just skimmed through my notebook of logica on postgres and I don't see me installing it. Small thing but thought I'd share.
@Dataset
specifies schema to be used as a default for@Ground
, which connects predicates to concrete tables. ..
wow, Ground
is elegant :).
I think I may have a different mental map of databases so just wanted to clarify that first.
In postgres
for e.g., I think of referring to a table as db.schema.table
where db
is another database(
https://chartio.com/resources/tutorials/how-to-list-databases-and-tables-in-postgresql-using-psql/#listing-databases) within the same database server. I see a schema as something like a namespace.
In the e.g. above from chartio, the database is sales
, the schema is public
and the table is leads
. If you don't specify a table, the default schema is public
but I could have another schema with the CREATE SCHEMA
command.
I think SQLite has no schema concept and maybe what you are referring to maps more to SQLite(and BigQuery?)
@RAbraham , oh I see. Then I think I didn't use the word "schema" correctly in my last reply.
Yes, it appears that in SQLite schema and database are synonyms while in PostgreSQL they are not. But I am not 100% sure :-)
Logica compiler builds SQL as a string. So the @Dataset
just specifies the prefix that through dot gets appended to the table name corresponding to the predicate. So it could be called differently depending on the terminology of the backend engine used.
Hope this helps and at least it's clear how to use it :-) Let me know!
Great to hear it works with PostgreSQL. I think CoLab has psycopg2, so we didn't have to install it there. I think it's OK for the user to install it if they want to use psql connection. Logica on it's own has no dependencies, which helps with maintenance.
If you happen to have a chance to send a pull request with your change enabling PSQL and your example of using it (which could have a comment on what needs to be installed) in the examples/more folder, I'd appreciate it!
Thank you!
re: prefix append, nice. reminds me of the Law of Demeter but for network paths :)
PR: https://github.com/EvgSkv/logica/pull/204 lmk
Law of Demeter is an interesting analogy. :-) Merged the PR. Thank you!