*args, **kwargs, and custom class wrappers in Python
At Thyme Care, we’re growing quickly, so we are always looking for ways to improve the onboarding process for new developers and to make application changes easier. One way we do this is by abstracting out the cookie-cutter code written for similar features so there is less cognitive load in making changes. Recently, we saw an opportunity for this in our data models and API interactions.
The Backstory: Adding custom attributes to SQLAlchemy columns
One of the places where we started to see a lot of repeated logic in our codebase was around adding a new data component to the RESTful portion of our API. The common pattern when adding a new data table was:
Update the ORM: Create a new SQLAlchemy data model using the declarative approach
Add a migration: Auto-generate a new alembic migration to add the model to the database
Add data validation: Create a mostly auto-generated marshmallow schema for the model to power our validation and Swagger API docs
Add endpoints: Create endpoints for getting, updating, and deleting the new data
Add mock data: Add a new factory to generate mock data for the data model using factory_boy and faker
Write tests: Create tests to guarantee the API layer business logic
From a business logic side, there are some columns that can be set only on create, some columns that can be set on create and update, and some columns that are required/optional for one/both of those operations. SQLAlchemy doesn’t provide us a way to specify this on the column, so we were redefining this information at our API layer:
We were able to pull out much of this into a function to read from the SQLAlchemy model, so eventually got to this abstraction to build our marshmallow schema:
This was better but still felt like we were duplicating business logic in multiple places. For instance, if a field is nullable in the SQLAlchemy model, it should be optional at the API layer in most cases. This meant when adding a new field you had to add it not only to the SQLAlchemy model but to a bunch of scattered schemas to get it to be accepted at the API layer.
Ideally, we wanted to put all of the information about a column inline with the column declaration on the SQLAlchemy model. It’s not a foolproof solution for more complicated models or endpoints but for your basic CRUD endpoints it’s a nice thing to opt in to. What we wanted was this:
With that code, our create and update endpoints can know to accept a required first_name field.
A basic multiple inheritance pattern doesn’t work here, since the SQLAlchemy column constructor doesn’t allow extra arguments.
To do make this work, we wrote a wrapper around the base SQLAlchemy column class that we use instead, which injects our own metadata:
Finally, we can put them inline:
Then, in generating our schemas for our endpoints, we can use some simple methods we’ve added to our tables (full code for BaseModel in the appendix):
One of the downsides of this is with documentation in IDEs in that we lose the docstring and argument hints for the underlying class because it’s overridden by our wrapper. This isn’t a huge problem if the underlying class is a well-known class with extensive documentation (like the SQLAlchemy Column below) but is a consideration.
All in all, we’ve had luck using this wrapper pattern to make our code more maintainable and thought it’d be helpful to share out. We’d love to hear any thoughts on more advantages, downsides, or other ideas in the comments, and we’re looking forward to sharing more learnings in the future.
tl;dr; Generic Example
Sometimes we want to add some custom data to a third-party library class in Python but obviously can’t change the library itself. Using a simple inheritance pattern along with Python’s *args and **kwargs symbols, we can insert our own metadata into a wrapper class without affecting the underlying implementation. Here’s the basic pattern using a pretend Square class library: