We, data people, can finally control our own infrastructure and we don't need to bother our always busy folks from IT department or even more hard to reach - the third party infrastructure providers. If you don't work this way - you definitely need to start and this article will help you understand the essential of IaC.
One of the steps you need to follow to implement DataOps in your organization is to use a version control system (and use it properly with correct branching and merging strategy). But to make it possible you need to adapt the Everything as Code principle at the first point. So - all your scripts are code obviously. All your ELT and data analysis processes should be code as well - that’s doable. Your CI/CD pipelines are probably also transisted to a code (thank you YAML!). But what about your infrastructure? Now you are in the cloud, so a decision about moving your infrastructure definition to the code should be a no brainer!
What Infrastructure as Code (IaC) actually means? The idea is to use a descriptive coding language to automate the process of provisioning the cloud infrastructure. It’s like the source code - but instead of coding the application you code your infrastructure resources. This approach has a lot of advantages which can be grouped into the following areas:
One very important and powerful word to remember - IDEMPOTENCE. By using declarative code you specify exactly WHAT you do need, you don’t care about how it is going to be achieved. Your cloud resource manager knows it better, that’s for sure. Being idempotent means your code provides exactly the same result each time it’s run. Since it is a code you can easily verify its correctness. You can check the errors related to the code itself but also you can run the what-if analysis (run your code without performing the deployment) so you know what changes are going to be applied during the actual deployment. Also, you can automate deployments and make them less susceptible for human errors by using CI/CD pipelines. You can increase the security by limiting the privileges to change your resources only to service accounts. And last but not least - you have the infrastructure in source control, so you can peer review all the changes.
Each project needs to maintain multiple environments. In general the setup for these environments is very similar. Of course, VM sizes and database tiers will vary between DEV and PROD but hey, that’s configurable. Also, with IaC you can easily provision new environments on demand. Need a new DEV sandbox for a few days? Run your IaC template and have it ready in minutes.
You don’t have a full picture of created resources when using graphical interfaces like Azure Portal. Yes, it’s easy and funny but you have limited control. Also, having IaC helps you to track and audit your changes (remember, source control) and makes it easy to document it.
So, stop scrolling the Internet and start coding your infrastructure now!
Thanks for reading!
Hope you enjoy it and if you'd like to talk more about it, please reach out to me via email: mariusz@faro.team
Sources: