Estimating AWS Infrastructure Cost from Terraform Templates

Author
Bjorn Stange
Publication Date
4 May 2017

Estimating AWS Infrastructure Cost from Terraform Templates

I had this idea while prototyping some infrastructure management with Terraform. Wouldn’t it be useful to know how expensive your infrastructure was going to be before you launched it? With tools like Terraform, CloudFormation, and others, you can “codify” your infrastructure, and having infrastructure that can be parsed with code should allow us to do exactly that price estimation. I had assumed AWS had a pricing API when I set out on this task, only to discover that I basically had to write my own.

AWS does not make price estimation easy for you. They do have something that they call a Price List API, however, it’s not something you can query directly. You can get all the pricing data for a single service, such as EC2, but that’s a relatively large data set. If you want to get the pricing data for a particular size of instance, in a particular region, for example, that’s not directly possible. You have to get all the data, even if you’re only interested in a small part of it.

I ended up writing three projects to accomplish this. First, I needed a way to ingest the pricing data into a database, this simple python script does that for me: https://github.com/Bjorn248/aws_pricing_data_ingestor. The reason I used MariaDB instead of Postgres was because the CSV provided by AWS was not directly compatible with LOAD DATA LOCAL INFILE in Postgres, because of the way the CSV was quoted. MariaDB does the job just fine. One thing I noticed while writing this was that the pricing data in AWS was actually updated quite frequently. They are constantly adding new services and columns to existing tables, so I had to implement schema generation inside the ingestion script to be able to keep up with AWS. The benefit of this is that I won’t have to change the script or any static schema going forward, and if AWS adds any new services, my script should work just fine. This script runs daily in lambda to ingest the latest pricing data into a RDS MariaDB instance.

The second project I wrote was an API to expose the data in MariaDB. The code can be found here: https://github.com/Bjorn248/graphql_aws_pricing_api and the publicly accessible endpoint can be found here: https://fvaexi95f8.execute-api.us-east-1.amazonaws.com/Dev/graphql/. You can send POST requests to that with valid graphql request bodies and get real pricing data! It takes about 10 seconds to warm up lambda so be patient for that first response. I decided to use GraphQL because that would allow the client to use all the flexibility offered therein to tell the API which data it needs and how it needs it. I also had to do GraphQL schema generation inside the API, which happens every time it is launched. I know that I did not write very well optimized code here, and would appreciate any feedback, but it does the job for the purpose of getting data for Terraform cashier, the tool that started this whole journey. Another thing that came to mind as I finished the API was that anyone can use the API for their own project. They could write a similar tool for CloudFormation, or use it in ways that I had never dreamed of. The API was probably the best and most generally usable thing that I made during this process, despite its disappointing lack of performance optimization.

The third and final leg of the journey was Terraform Cashier, the tool that I wanted to write in the first place. The code for the cashier app can be found here: https://github.com/Bjorn248/terraform_cashier. This parses the HCL in Terraform files and tries to figure out how many EC2 and RDS instances you are trying to bring up. One problem with this is that for more advanced users of Terraform, it will likely not be able to count your instances because of the abstraction layer provided by modules. At this stage, I don’t know if using the HCL library was the correct choice, and the fact that I can’t parse modules correctly tells me that I likely made a design mistake here. This works for the simple types of Terraform templates I am working with right now, so for now I likely won’t change it, but I just wanted to make the design flaw apparent, and ask for suggestions/feedback. My goal was to estimate cost before launch, not after. Many existing tools, even AWS themselves, tell you how much things cost after you launch them, so I figured the real value would come from the ability to estimate cost before launching anything. Perhaps I will have to work more closely with the Terraform libraries instead of the HCL ones in order to properly parse the Terraform files. Any suggestions on that front would also be appreciated.

For now, my journey into AWS price estimation from Terraform templates has ended. I’ve decided that the three projects that came out of it are stable enough for my personal use but I thought I’d share the journey and use this opportunity to gauge the interest for pre-launch price estimation from codified infrastructure and the pricing API.

Original blog posted on Medium. Find it here

Let our expertise complements yours

We believe that addressing customer challenges gives you opportunities to delight. Using our proprietary Friction Reports and  strong industry expertise, we dig deep into customer sentiment and create action plans that remove engagement roadblocks. The end result is seamless, relevant experiences that your customers will love.