įor every SQL statement that modifies our underlying data, PGAudit creates a corresponding entry in the database log file. The goal of PostgreSQL Audit to provide the tools needed to produce audit logs required to pass certain government, financial, or ISO certification audits. Unfortunately, our system was impacted by running large SQL script files because of the PGAudit extension: The PostgreSQL Audit Extension (or pgaudit) provides detailed session and/or object audit logging via the standard logging facility provided by PostgreSQL. Something that even lower capacity instances should be able to handle easily. The bulk insert of test data, in this case, was slightly larger than usual but still within a reasonable size range. Most of the time, they are short (inserting one or two records at a time), so they don't take long to complete and are an excellent way to set up the environment for testing against known data inputs/outputs. In development, we usually prepare fixtures using SQL scripts that get executed on the database. It was a single SQL script file being used to bulk insert test data. We were able to pin down the root cause of the service outage. In our case, the log files took up all the free space and were the reason for our service going down. Not everyone knows that this drive is also used for storing all database log files (in alignment with your data retention policies). Under-the-hood, your managed EC2 instance mounts an EBS volume for data storage (volume mounting makes features like snapshots possible). It runs a database engine (in our case PostgreSQL server) and a utility tool (that we will call AWS Agent) for connecting it to the AWS console. For others to quickly benefit from our learnings, I’ve outlined below a TLDR version (disclaimer: for brevity, we will be discussing a single AZ setup) How Amazon RDS works?Īmazon RDS is basically an EC2 instance that AWS manages for you (updating it when needed, applying security patches, etc.). Investigating into the root cause, we were taught a lesson by AWS about its product internals. But what used up all the disk space? No exciting cliff-hanger here - it was a database log file! One might be tempted to throw more □□□ at it by increasing the storage space and move on. When the environment went down, no load tests were being run, and we were pretty confident that there will not have been any significant amount of data processed and stored that day. Luckily the impacted environment was only used for testing and had limited access from the outside world. ![]() ![]() After a short investigation, we found that a single AWS RDS instance had run out of space. I am not going to say - we were caught off-guard at the last minute, but during preparation for soak testing, one of our applications became unresponsive for no apparent reason.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |