At Setter we have a 2nd generation API server that handles:
- API requests coming in from internal software
- API requests from our customer-facing iOS and Android applications
- Webhook requests from 3rd party services
This is all quite typical stuff.
Communicating With Systems Outside Of Our Control
In the process of handling some of these requests, we have to communicate with 3rd party services.
One example being customer order approvals, in which we have to send the customer an email to confirm the order and provide a sort of feedback loop to the user.
So in this case, the flow looks like:
- Receive API request from mobile app
- Process API request (which will involve make some DB inserts / updates)
- Dispatch API request to 3rd party email provider (in our case we use Postmark and we highly recommend it)
By making API requests directly from our system, we've now reduced the certainty of success, and introduced incomplete states. For example, Postmark (the email service provider we use) could be down for routine maintenance, and hence a request to their service could fail at unpredictable times. This introduces an incomplete state in that the customer will never receive an email to let them know that their order was indeed processed and acknowledged.
This sort of thing has happened a few times at our company.
Eliminating The Dependancy on 3rd Party Services
Currently we're undergoing an internal RFC processes to decide how we're going to decouple 3rd party services from the core of our system.
I took the lead on this particular RFC (although I've had lots of guidance from my colleagues while writing it), and in this post I discuss the bulk of it.
What I'm proposing at our company is that we leverage the technologies we already have (PostgreSQL & NodeJS) in order to not increase system complexity - as opposed to using a tool such as RabbitMQ (not to say that RabbitMQ is bad).
By using PostgreSQL's LISTEN
/ NOTIFY
features, you have everything you need in order to have a high-performance, fault-taulerant pub / sub system.
I went ahead and created an example app that implements this system - feedback welcome!
https://github.com/gDelgado14/pg-pubsub
Here are the relevant parts of the example project (as found in the README.md):
Implementing the "Pub" in Pub / Sub
The migrations
folder contains the schemas / triggers / and SQL functions necessary to implement the publishing aspect of the system.
More specifically,a db_events
table is created which stores messages sent into the pub sub system. Further, there is a trigger made that executes a sql function on any insertion into the db_events
.
Implementing the "Sub" in Pub / Sub
Inside src/services/client.ts
, I use the pg
module to:
- Connect to the db
- Listen to "pub_sub" events being invoked from within postgres (which I've defined in the migrations)
- Invoke any asynchronous functions associated with the various events that can occur in the system.
Now you can subscribe to any event you want. You can define the events in your code. It really helps if you use a statically-typed language (which is why I implemented the example in TypeScript) so that your message payload is always consistent to the message the payload is associated to.
You can see some example channel / payload combinations inside src/services/client.ts
. For example, if you publish an sms
message, the payload going in and out of the pub / sub system will always be { msg: 'some string' }
Another awesome aspect of this system is that you can choose to run your subscription logic on a different machine / runtime / language. This is because it's postgres that's sending messages into the runtime. In my example I kept it simple and had both the publishing and subscribing happening in the same app, but it doesn't have to be that way if you don't want it to be!
Handling failure gracefully
One thing I haven't thought enough about is how to handle message processing failures.
Example: Say I publish an email
message into the pub / sub system and a subscriber tries to process the message by, say, sending an API request to Postmark and Postmark is down. How should I best manage this?
I think implementing an exponential back-off retry might be the right approach.
Would love to hear your thoughts on this!
Liked this post? Consider buying me a coffee :)