falcon icon indicating copy to clipboard operation
falcon copied to clipboard

doc: SQLAlchemy integration guide

Open kgriffs opened this issue 6 years ago • 11 comments

kgriffs avatar Jan 05 '19 15:01 kgriffs

Example code copied from Gitter and posted originally by Nateyo:

def __init__(self, db_engine):
        self.engine = db_engine
        session_factory = sessionmaker(bind=db_engine)
        self.session_registry = scoped_session(session_factory)

    def process_resource(self, req, resp, resource, params):
        req.context["session"] = self.session_registry()

    def process_response(self, req, resp, resource, req_succeeded):
        if req.context.get("session"):
            self.session_registry.remove()

kgriffs avatar Jan 05 '19 15:01 kgriffs

Note 1: scoped_session is supposed to be a magical proxy abstracting away concurrent access troubles. If that level of magic is desired, there is probably no need to dereference this way: self.session_registry(), instances of scoped_session already proxy everything (here, I had prepared hello.db in advance):

>>> from sqlalchemy import Column, Integer, String
>>> from sqlalchemy import create_engine
>>> from sqlalchemy.ext.declarative import declarative_base
>>> from sqlalchemy.orm import scoped_session, sessionmaker
>>> 
>>> Base = declarative_base()
>>> 
>>> class Item(Base):
...     __tablename__ = 'items'
...     id = Column(Integer, primary_key=True)
...     message = Column(String(64), nullable=False)
... 
>>> engine = create_engine('sqlite:////tmp/hello.db')
>>> factory = sessionmaker(bind=engine)
>>> session = scoped_session(factory)
>>> items = session.query(Item).all()
>>> [(item.id, item.message) for item in items]
[(1, u'Hello, world!'), (2, u'Hello again.'), (3, u'Hello, world!'), (4, u'Hello again.'), (5, u'Hello, world!'), (6, u'Hello again.'), (7, u'Hello, world!'), (8, u'Hello again.'), (9, u'Hello, world!'), (10, u'Hello again.')]
>>> session.remove()

vytas7 avatar Jan 05 '19 16:01 vytas7

Note 2: using scoped_session is not mandatory, if one has middleware to set up and dispose sessions anyway, an alternative way can be something in style of this snippet from the SQLAlchemy session lifecycle docs:

class ThingOne(object):
    def go(self, session):
        session.query(FooBar).update({"x": 5})

class ThingTwo(object):
    def go(self, session):
        session.query(Widget).update({"q": 18})

def run_my_program():
    # This part could go into process_resource or process_request  // Vytas
    session = Session()
    try:
        ThingOne().go(session)
        ThingTwo().go(session)

        # The part below can be part of process_response  // Vytas
        session.commit()
    except:
        session.rollback()
        raise
    finally:
        session.close()

The same could be achieved more explicitly with a dedicated data manager class and/or context managers, as demonstrated later in the linked document above.

https://github.com/wileykestner/falcon-sqlalchemy-demo seems to use the context manager approach as well.

vytas7 avatar Jan 05 '19 16:01 vytas7

Note 3: not sure how common this pattern is, but streaming responses from the database might require special care in case database pools/connections/sessions are managed by middleware. Middleware is applied after responders, but before returning response to the client. If the response is streamed off a generator employing database code, premature cleanup might interfere.

Consider talking about, and providing recipes for, implementing an iterable with close() method, and handling cleanup there, if needed. CloseableStreamIterator might be helpful as a reference. try... finally and/or context manager should do the job too, although it might [theoretically] depend on the mercy of the WSGI server.

vytas7 avatar Jan 05 '19 16:01 vytas7

Note that there is nothing wrong with the bare metal example from the FAQ either, provided that level of abstraction suffices for the given use case. Maybe that could be converted to an example as well?

vytas7 avatar Jan 05 '19 17:01 vytas7

Unless I'm mistaken, the bare metal example is doing a similar thing to the middleware originally posted:

In the db manager init:

self.DBSession = scoping.scoped_session(
            orm.sessionmaker(
                bind=self.engine,
                autocommit=True
            )
        )

Then whenever the session property is accessed:

@property
    def session(self):
        return self.DBSession()

From this sqlalchemy docs page it notes something very specific that the example project doesn't seem to do:

Ensure that scoped_session.remove() is called when the web request ends, usually by integrating with the web framework’s event system to establish an “on request end” event.

Though note that these sessions would still likely be implicitly cleaned up. The downside to the middleware session handling is that it makes the scope muddy for "controllers" to have to handle the injection into anything you want to use to access the database.

Nateyo avatar Jan 05 '19 17:01 Nateyo

@Nateyo sorry if I was ambiguous, that example is not bare metal at all wrt SQLAlchemy usage.

I was referring to the code snippets therein:

engine = create_engine('sqlite:///:memory:')
resource = SomeResource(engine)

# <...>

# Read from the DB
result = self._engine.execute(some_table.select())
for row in result:
    # ....
result.close()

# ...

# Write to the DB within a transaction
with self._engine.begin() as connection:
    r1 = connection.execute(some_table.select())
    # ...
    connection.execute(
        some_table.insert(),
        col1=7,
        col2='this is some data'
    )

Regarding @jmvrbanac 's example you are discussing, I don't think it needs any cleanup/removal since it uses autocommit=True mode (one may question whether that is adequate for real-world APIs, but it does its job as an introductory example).

vytas7 avatar Jan 05 '19 17:01 vytas7

I could be wrong but I thought autocommit will only call commit() and not close() on the Session though, yea? Agreed it's fine for introductory, but the docs are pretty clear .remove() should be called when using scoped_session when the request or need for the session has ended.

Perhaps we could modify the docs to support examples for explicit session management, much like the link you posted in gitter and a more global style scoped_session approach.

I could easily also see middleware looking like this

def __init__(self, db_engine):
        self.engine = db_engine
        self.session_factory = sessionmaker(bind=db_engine)

    def process_resource(self, req, resp, resource, params):
        req.context["session"] = self.session_factory()

    def process_response(self, req, resp, resource, req_succeeded):
        sess = req.context.get("session")
        if sess:
            try:
                    sess.commit()
            except...rollback, etc. manage the response
            finally:
                    sess.close()

Where you consider anything that happens within the falcon request as part of a transaction that's only committed and closed when the resource is done being processed. Apologies for typos and succinct code I'm just spitballing in the editor here. I think this example is closer to what you've been trying to bash into my head @vytas7. No need to use scoped_session if you can just create a session directly in the context it's going to be used?

I don't know, it's tough. There's many different ways to accomplish managing a session at different layers of the application. All with different pros and cons. I think what I'm going to do is try and catalog what the official falcon docs cover, discuss with my team early next week how we'd like to structure the approach for our application, and then I think I could better discuss the topic and help out with the official docs.

Nateyo avatar Jan 05 '19 17:01 Nateyo

what you've been trying to bash into my head @vytas7

Well, I wouldn't put it that way :smiling_imp: You were asking for help choosing between approaches -- I would probably go that way if were starting a new, clean (until the real development begins :slightly_smiling_face: ) Falcon+SQLAlchemy API.

But I'm no expert of SQLAlchemy, and scoped_session works fine for many users. I just advised exploring the path of not using it, since, I got the impression that you would like to have greater control of session lifecycle, and also slightly based on my experience with Flask-SQLAlchemy (you are well experienced in it too, as I understand), which is like scoped_session on steroids. The thing works really magically well until you want to tune something more advanced, like advanced logic for load-balancing queries with different binds.

Getting back to this issue, I think it is quite open-ended and we may just go with whatever is the most popular among users, since Falcon is not very opinionated, after all. Documenting many different paths is also more time demanding.

But ideally I would to see that we provide help for users choosing different paths. "Choosing your Guarantees" says Rust language about memory management, it is something similar here, choosing your level of magic:

  • The automagical scoped_session -- as I have commented above -- its instance can be a resource attribute, or even a global variable (proxying thread-locals, where have I seen that? :thinking: )
  • Getting Sessionss for each request/operation, be it middleware (like your last snippet), helper class like our FAQ suggests, or just a context manager, or a combination of these
  • Working with low-level connections like depicted in the FAQ snippets

vytas7 avatar Jan 05 '19 18:01 vytas7

Given that there are different approaches with different tradeoffs, we should probably include 2-3 in a new recipes section in the docs, but also include a disclaimer that there are other ways and maybe point them to certain areas of the SQLAlchemy docs or other resources for further reading.

kgriffs avatar May 05 '19 23:05 kgriffs

hi guys, just my $0.02 .

I come from a standpoint of considerable pain around configuring sqlalchemy within flask. Here's my opinion on the perfect recipe you can build.

how does one configure sqlalchemy in falcon so that:

  1. each endpoint does not have to open session, scope session, etc. This is a people+team issue - i want to reduce possibility of errors made by multiple engineers working on different parts of the codebase. can i have a central place where i set the per-request per-thread scoping, etc

in flask, we end up doing something like


def get_session(db='main'):
    if has_request_context():
        if 'db_sessions' in g and g.db_sessions.get(db):
            return g.db_sessions.get(db)
        session = current_app.databases[db].session_factory()
        g.setdefault('db_sessions', {}).update({db: session})
    elif has_worker_context():
        databases = get_database_conn_from_worker()
        session = databases[db].session_factory()
    else:
        if not DATABASES:
            DATABASES.update(init_databases(get_config('DB_URIS')))
        session = DATABASES[db].session_factory()
    return session
  1. the sqlalchemy models that i have written, should be usable from RQ/Celery. Is the code so intermingled that celery is having to instantiate a Falcon instance ? This is a frequent mess in flask - a lot of sqlalchemy is done within "requestcontext" and causes a lot of problems when i load the models in celery/rq. Which is why you use "has_request_context" above. I can use the same code to load models into rq/celery.

  2. no hard-enforced session level commits. there's a lot of logic where i may choose to not commit data because of some validations, etc inside a request. So there must be flexibility to skip committing data, return a valid error response (which means im not throwing an exception or something) and then finish the request.

  3. Gunicorn/uwsgi. there is all the configuration around gunicorn/uwsgi workers and threads. If you dont configure sqlalchemy properly, then you will run into "session was already closed" errors. Everyone runs into this sooner or later with flask+gunicorn. Also want to point out your documentation for uwsgi. Its really good with how should falcon be written to get uwsgi to work (including the wsgi.py). essentially, the question is - when you bring in sqlalchemy into the mix, how should we configure wsgi.py

  4. No separate "flask-sqlalchemy" package. Show us how to do this with native sqlalchemy. Because your package will always lag behind sqlalchemy. most people are more than happy writing some wrapper code if it means they can use the latest sqlalchemy package right now. Especially with the pace of changes in databases like Postgres, Aurora, etc - this is very important.

  5. connection pooling - this problem is because of points 4 and 5 aboive. mysql guys generally use the built in pooler. postgresql guys generally use pgbouncer. do note the problem is not around configuration in sqlalchemy itself. if you dont setup the configuration of a pooler the correct way (by respecting the threads, workers, etc) you again run into tons of "connection was closed problem".

If you could give ONE recipe that covers these six major pain points, that would be super killer awesome. And for the rest, you can point them towards sqlalchemy documentation!

sandys avatar Jun 21 '19 09:06 sandys