Architecture of the Amazon Aurora Engine

Hosted by: Dallas/Fort Worth Postgres

Presenter – Matthew Mattoon – Founder and CTO of Entasis

Everyone knows what a database is right!? It’s a bunch of data entries that are pulled in a particular fashion from a interface to do things in an organized order…or something. All I know is that you request it and it comes up somehow. And for most people, that’s okay, and frankly, if set up the right way, that’s how it should be.

Most people can say that they know what databases are and how to use them. You see listed in the resume skills such as SQL, SAP (although this is an interface), Access and Oracle. But HOW do they work and what are the advantages of each flavor are the questions to be asked when architecting. And these are the questions I’ve found myself unsure of in my pursuit of the cloud.

Being a SysAdmin I deal with connecting to databases and getting them all set up on end user computers, but I never have to manage the database, as being in a company as large as PepsiCo, many of these activities are siloed across various teams. So when studying for my architect exam I became aware of the gaps in knowledge that I had in this particular subject.

So I went looking for answers.

I stumbled across a Postgres group hosting a talk on Aurora, a proprietary relational database offered by Amazon, and immediately registered to join in hopes to further my understanding of not only Aurora, but databases as a whole.

Matthew Mattoon, the founder of Entasis a AWS consulting and managed services provider highlighted a few of the key features of the service…

A Database tailored for the Cloud

Amazon’s Aurora has been engineered to perform as a managed service within the cloud, utilizing multiple availability zones (AZ), read replicas and auto-scaling for high availability and fault tolerance. With the introduction of protection groups, data is stored in 10GB blocks stored in 6 storage nodes spanning 3 AZs. These protection groups will automatically be provisioned as your database grows up to 64 TB per Aurora instance.

Ability to sustain the loss of an entire AZ or 2 storage nodes without impacting productivity

Even in the instance of the Master node going down, the read replica will take its place with no data loss as it is replicated asynchronously

In having 6 nodes in a protection group, writes are considered successful and acknowledged ONLY after reaching a quorum of 4 out of 6 completed writes across the protection group.

One could suspect latency with this type of process, however though writes being done in parallel…and some magical Amazon engineering (stuff I don’t know), Aurora is able to complete the same workload with 1/8th the IOPS of a traditional MySQL database.

It is stated that Aurora provides 5x the performance of MySQL and 3x that of PostgreSQL

All backups of Aurora are automated to S3 continuously and there are options to create and restore point in time snapshots with no impact to performance. This makes Aurora more than suitable for workloads that require high availability and redundancy.

This information provided me with a few more good blocks to build by understanding upon and cleared the fog on the advantages of using this particular service. We spent the next half hour for Q&A and ended the evening arguing over the semantics of Amazon’s “serverless” offeringes. What exactly is serverless… and is it REALLY serverless? Well, I will have to cover that at a later time. Other than that, I gained new understanding, exposed myself to various perspectives and built new relationships in the tech community. All to which makes me want to learn more.

“Excellence is not an occasional act but a persistent habit” – A. Wall