DynamoDB & Your Sessions

As you may have read in the previous entry in this series, our Session Management API shall use DynamoDB for data storage, and, specifically, will not be using Redis. The reasons for this are many, but the most pertinent one is cost. For more on this, consult your nearest Cloud Architect.

(Yes, this is Step One. The previous post is The Introduction. Should you wish to follow along with the code as a whole, it's over on GitHub.)

Let the Dynamo spin!

Create a new Table

My SST Stack configurations already has a DynamoDB table called MarukiTable; let's add another one, called MarukiSessionTable:

https://snappify.com/view/12b4e15b-0c00-47c7-acfe-be1ba9a5b258

This table has a standard partition and sort key, along with a Global Secondary Index (w/ requisite keys), along with an expiry column. That column is being marked as the timeToLiveAttribute. DynamoDB allows you to configure a column whose value represents the expiry time (in Unix epoch time) of that item. DynamoDB will automatically run background processes to identify and remove expired items, at no cost.

(Because erased data is written to DynamoDB Streams, this feature is useful for a myriad of things, including removing subscriptions, censoring data after specific times, and billing.)

Provide table access to functions & site

I'll also need to give my functions access to my new table, as well as making its name available to my Static Site (Which I'm doing with sst-env, which is not germane to our discussion):

https://snappify.com/view/156ea541-9f6f-450f-b8b2-328072ecf4ee

Why am I creating a separate table for sessions, instead of keeping things in the erstwhile Single Table realm? Paranoia, Dear Reader, paranoia. Whilst DynamoDB won't erase anything whose TTL exceeds 5 years in the past (and thus my data is safe even should blank entries evaluate to 0), I simply don't like the risk. It would be all too easy for me to, with a moment's inattention, write erroneous code whose execution would add contemporary expiry dates to my entire table. I shan't risk it.

Define Session Data Layer

I am using the delightful DynamoDBToolbox to work with my session data, along with DayJS for time management and ulid for ... ulids. Let's install those now, by adding them to our package.json:

https://snappify.com/view/5c4aec74-1e9b-4139-99ec-130e659ea265

In my functions directory, I have a data subdirectory, where we shall now scribe our table definition:

https://snappify.com/view/64aeaf98-43e0-492d-b04f-7331bcf8fac8

The only notable thing here is the table name; I like to provide an obviously mistaken fallback option. That can help short-circuit debugging when you accidentally omit environment variables; It also stops TypeScript complaining that undefined is not string.

Next, we need to create a session Entity. We'll need a unique partition key, along with a way to store the authorized party (the principal) and what mechanism authenticated them. We additionally need a timeToLive value so that automatic session expiry takes place, and it might be worthwhile storing the originating IP of the authentication event. Finally, everyone loves a created/modified timestamp, so we'll add those as well:

https://snappify.com/view/093588a6-dcfb-4965-86a8-509058c04824

DynamoDBToolbox is providing us automatic creation and modified timestamps, as well as helping us fill in some of our data. We're prefixing the session & principal IDs so the data makes a bit more sense when we look at it "raw", lines 21-29. On line 32, we set a default value for expiry, using dayjs to set a value one hour in the future, in the required Unix Epoch Time Format. Finally, we make the authorization mechanism and origin required values.

Add a Global Secondary Index

One thing we're missing is the ability to look up all sessions for a specific principal. DynamoDB stores data such that data of a specific nature is stored in a specific partition, inside of which individual rows are sorted (hence the key names). Ideologically, DynamoDB expects that users (in this case, our application) know what nature of data (and thus which partition and thus, partition ID) they wish to retrieve data from. As such, almost every operation expects you to provide partition key, with the exception of Scan. The latter scans and returns every item in the table sequentially, optionally filtering items after retrieval but before responding. Scan is not the fastest operation. Generally speaking, don't use Scan.

Scan is not the fastest operation. Generally speaking, don't use Scan.

Currently, we can retrieve a session if we know the ID (and we should, since validating IDs is this code's entire balliwick), but finding all sessions for a user would require a Scan, as we don't know the session ID (and thus partition key) on which to query.

We can solve this by adding data into our Global Secondary Index (GSI) fields. GSI's contain a subset of attributes from the table and support the Query operation. They have their own, unique partition key and sort key, which we can instruct DynamoDBToolbox to populate:

https://snappify.com/view/a4f38912-968a-4489-945f-0aa449607cfc

We're setting these attributes to hidden because they're not unique; They're simply duplicating data on the table. Each has a prefix, again, to make the raw data clearer, and then we're setting their value with a function.

By querying the table's GSI for Principal|some_principal_id, we can retrieve all sessions for a principal at once, allowing us to (for instance) show them to the user, or bulk invalidate them.

(We could also consider making an index that uses authorizationOrigin as the partition key. That way, we could easily see every authorization request from a specific IP address.)

And that's that.

We're ready to write the more finicky bits; Those actively managing sessions. But firstly, dear reader, let us take a break. We shall resume in Step Two!

(Should you enjoy this content, or wish to be notified of the publication of updates, I humbly request you provide me with a Follow and a Like. Of such actions are Viral Content made, and I wish only to help as broadly as I might.)

Step One: Configuring our Data Layer

Using DynamoDB & API Gateway to build a combined Web & API Authentication Serverless Session Store