Giorgio Delgado

Zero dependancy Pub / Sub system with PostgreSQL

May 27, 2018

At Setter we have a 2nd generation API server that handles:

This is all quite typical stuff.

Communicating With Systems Outside Of Our Control

In the process of handling some of these requests, we have to communicate with 3rd party services.

One example being customer order approvals, in which we have to send the customer an email to confirm the order and provide a sort of feedback loop to the user.

So in this case, the flow looks like:

  1. Receive API request from mobile app
  2. Process API request (which will involve make some DB inserts / updates)
  3. Dispatch API request to 3rd party email provider (in our case we use Postmark and we highly recommend it)

By making API requests directly from our system, we've now reduced the certainty of success, and introduced incomplete states. For example, Postmark (the email service provider we use) could be down for routine maintenance, and hence a request to their service could fail at unpredictable times. This introduces an incomplete state in that the customer will never receive an email to let them know that their order was indeed processed and acknowledged.

This sort of thing has happened a few times at our company.

Eliminating The Dependancy on 3rd Party Services

Currently we're undergoing an internal RFC processes to decide how we're going to decouple 3rd party services from the core of our system.

I took the lead on this particular RFC (although I've had lots of guidance from my colleagues while writing it), and in this post I discuss the bulk of it.

What I'm proposing at our company is that we leverage the technologies we already have (PostgreSQL & NodeJS) in order to not increase system complexity - as opposed to using a tool such as RabbitMQ (not to say that RabbitMQ is bad).

By using PostgreSQL's LISTEN / NOTIFY features, you have everything you need in order to have a high-performance, fault-taulerant pub / sub system.

I went ahead and created an example app that implements this system - feedback welcome!

https://github.com/gDelgado14/pg-pubsub

Here are the relevant parts of the example project (as found in the README.md):

Implementing the "Pub" in Pub / Sub

The migrations folder contains the schemas / triggers / and SQL functions necessary to implement the publishing aspect of the system.

More specifically,a db_events table is created which stores messages sent into the pub sub system. Further, there is a trigger made that executes a sql function on any insertion into the db_events.

Implementing the "Sub" in Pub / Sub

Inside src/services/client.ts, I use the pg module to:

  1. Connect to the db
  2. Listen to "pub_sub" events being invoked from within postgres (which I've defined in the migrations)
  3. Invoke any asynchronous functions associated with the various events that can occur in the system.

Now you can subscribe to any event you want. You can define the events in your code. It really helps if you use a statically-typed language (which is why I implemented the example in TypeScript) so that your message payload is always consistent to the message the payload is associated to.

You can see some example channel / payload combinations inside src/services/client.ts. For example, if you publish an sms message, the payload going in and out of the pub / sub system will always be { msg: 'some string' }

Another awesome aspect of this system is that you can choose to run your subscription logic on a different machine / runtime / language. This is because it's postgres that's sending messages into the runtime. In my example I kept it simple and had both the publishing and subscribing happening in the same app, but it doesn't have to be that way if you don't want it to be!

Handling failure gracefully

One thing I haven't thought enough about is how to handle message processing failures.

Example: Say I publish an email message into the pub / sub system and a subscriber tries to process the message by, say, sending an API request to Postmark and Postmark is down. How should I best manage this?

I think implementing an exponential back-off retry might be the right approach.

Would love to hear your thoughts on this!