SQLAlchemy is an object-relational mapper widely used in the Python world, making it easier (usually!) for developers to interface with their database of choice. I have always fudged my way around with the various database methods for syncing state between the in-memory object instances and the database, without fully understanding the differences. It's time to settle once and for all when to use session commit() vs flush() vs expire() vs refresh()!
Note: Altough my experience is with SQLAlchemy's Flask variant, most (all?) of these principles should apply to all flavours of SQLAlchemy.
Key to the rest of this article is the concept of the Session in the SQLAlchemy world. Sessions are an in-memory "limbo" state for objects associated with database records.
Let's break down what this means:
The various methods mentioned in this article (commit / flush / expire / refresh / merge) are all slightly different ways to accomplish that last step in the lifecycle of persisting changes back to the database.
You may be wondering at this stage what determines when the Session lifecycle begins and ends. SQLAlchemy has taken an opinionated stance where it is usually up to the developer to decide when this begins and ends, with the availability of methods such as db.session.begin() and db.session.commit(). However in most web applications, the established pattern is to begin and end the Session with each http request.
Let's start with the most straightforward of the methods we are investigating.
The methods for db.session.expire(some_object) and db.session.expire_all() expires one or all objects in the current Session respectively. This means that:
A key behaviour of expiring is that all un-flushed changes to the object is discarded and not persisted to the database. For example:
user.name = 'user2'
db.session.expire(user)
user.name # ==> 'user1'
Note: Objects are automatically expired already whenever the Session ends. This means if there is a call to db.session.commit() or db.session.rollback() (automatically at the end of a HTTP request in the case of a web application), all objects are expired.
So when do you actually need to explicitly expire objects? You do so when you want to force an object to reload its data, because you know its current state is possibly stale. This is commonly when:
Now that we understand expiring objects, the methods for db.session.refresh(some_object) becomes much easier to understand.
Basically, refreshing means to expire and then immediately get the latest data for the object from the database. It involves an immediate database query, which you may consider unnecessarily expensive.
Here's how I decide when to use expire vs refresh?
Remember earlier in this article we mentioned that expiring objects will discard all un-flushed changes? Flushing means to push all object changes to the database. Note that this does not necessarily mean that changes have been made to the database records - you must still call db.session.commit() to update the database or db.session.rollback() to discard your changes.
Pushing object changes to the database means your database now holds the changes in its transaction buffer. This means there are 2 common gotchas with using flush():
# With autocommit: False
user.name # ==> 'user1'
user.name = 'user2'
db.session.flush()
user.name # ==> 'user2', returns the in-memory representation. If you view your db with another application, it will still show 'user1'
db.session.rollback()
user.name # ==> 'user1'
# With autocommit: True
user.name # ==> 'user1'
user.name = 'user2'
db.session.flush() # ==> db.session.commit() is automatically called
user.name # ==> 'user2'. If you view your db with another application, it will already show 'user2'
db.session.rollback() # ==> too late!
user.name # ==> 'user2'
user.name # ==> 'user1'
user.name = 'user2'
db.session.flush()
user.name # ==> 'user2', returns the in-memory representation. If you view your db with another application, it will still show 'user1'
db.session.refresh(user)
user.name # ==> 'user2' # ==> SQLAlchemy assumes the database has been changed, even if it hasn't committed! If you view your db with another application it, it will still show 'user1'
db.session.rollback()
user.name # ==> 'user1' # ==> Rollback discards the database transaction buffer
With this understanding of flush, it's now easier to understand committing. Conceptually db.session.commit() is basically 2 steps in one:
Note if your Session is configured with autocommit: True, calling flush() will automatically call commit() if outside a transaction.
Merging is a less common scenario, where you may have more than one in-memory representation what is essentially the same object. Being the "same object" is usually based on the database's primary key.
Here's an example:
user1 = User.query.get(1)
user1.name # ==> 'user1'
new_user = User(user_id=1) # ==> a second in-memory object with the same key!
new_user.name = 'user2'
user1.name # ==> 'user1'. Without merging, user1 doesn't know it is the same as new_user
db.session.merge(new_user)
user1.name # ==> 'user2'. Now updated in memory. Note not yet updated in db, needs flush() and commit()
Here's how I decide what to use: