top of page

Terraforming Nye Health

Andrew Carmichael, DevOps


When I stop to think of the word terraforming it immediately takes me to other worlds; barren and inhospitable worlds orbiting a star in a far away galaxy… but most of all, an opportunity for life. In some ways, this is the story of Nye’s infrastructure.

 

Would you like to join us on this intergalactic quest through the universe of tech? If so, you're in luck, we're currently hiring developers in Edinburgh!

Check out our open roles here!

 

Firstly, what do I mean by infrastructure?


Infrastructure in this context, is the cloud computing that represents the foundation for our products. Much as a bad foundation for a house can cause subsidence and with it, endless cracks and doors that never quite close, as is the case with IT infrastructure. A bad infrastructure in cloud computing will result in poor reliability and service outages but it’s that door that doesn’t quite shut, that’s the serious security threat. A good infrastructure on the other hand, enables you to build resilient, secure and high performing applications efficiently.


The story I am going to tell you here is how Nye Health have taken challenging foundations, resulting from early prototypes, to one of confidence and opportunity through rigorous best practice and drive.


The Inhospitable Planet



In the early days, Nye had a very typical infrastructure. We had AWS services that were manually configured through the AWS Console and our product ran on a single stateful EC2 instance. This was complimented with a separate AWS account for staging where we could test our server and client apps, prior to releasing the latest improvements.


However, the devil is in the detail. These two environments were subtly different and both susceptible to configuration creep which inevitably resulted in service issues, for instance we had (to name but a few):

  • A failed deployment of a new key feature with one release as the firewalls between production and staging environments were subtly different. The lack of parity between stage and production had the added side effect of extensive post release smoke tests and an inertia towards releases.

  • An embarrassing outage caused by an application called Tripwire. Tripwire was installed on the EC2 instance at the very early stages of Nye to help protect against threats and vulnerabilities but was no longer in use - as far as we knew! Little did we know that the once dormant Tripwire started filling up disk space at an alarming rate culminating in our server to ceasing to operate.

Worst of all, we had started to adopt and accept the bad behaviour of having to patch production manually: further risking service and furthering the configuration creep.


It needed to change. We were deploying our software to the inhospitable planet and had to spend a lot of time looking after it - time that could be spent building things users love.


Terraforming



As has been eluded to so far, one way we started to tackling some of these issues was by leveraging Hashicorp’s Terraform. Pretty quickly, we went from an ill configured single EC2 instance to a high availability Auto Scaling Group, configured in a software development lifecycle. We were also able to unify our development and staging environments with production.


Before I detail aspects of this transformation, I think it’s important to labour the change in paradigm that takes place when moving from traditional configuration through a web UI such as AWS Console vs Infrastructure as Code (IaC).

By developing our infrastructure using Terraform we changed explicitly to a software development lifecycle for our cloud computing.


Our infrastructure is now configured as code in our git repositories. This means it can be reviewed in the same way as changes to our product source code, through pull requests. What I have just listed may not sound that significant but think what it means: Nye Health have an auditable log for our entire history. The setup and provisioning of our infrastructure: load balancers, security groups, firewalls, user policies and so on, are all safely stored. Moreover, we can isolate changes and roll backwards or forwards where required, with more control.


Below are some highlights from our endeavours that you may find useful:

  • Just like source code for our product we introduced Terraform iteratively and through the same process and lifecycle of development, staging and to production until we had a production fully provisioned by Terraform.

  • Configured our state in AWS S3 with Dynamo DB providing locking - looking to transition to Terraform Web in future.

  • The automation has improved our speed and our safety of our deployments through consistency and removal of vectors of manual error.

  • Documentation is as code, not locked away in an engineer’s head - the state of the infrastructure is there for all to read and understand (hopefully).

  • Happiness. I like to code more than they do repetitively clicking through AWS Console to provision changes for all environments so it has improved my day to day work.

  • It is important to avoid the temptation of a quick AWS Console fix and to keep all work as IaC in order to avoid conflicts.

  • Actual re-use!!!! Nye Health adhered to Terraform best practice and developed modules in a separate repository and built up the infrastructure piece by piece. Until we could with a few commands create and destroy entire working environments for our Web Services as required.


  • To complete the picture we leveraged another Hashicorp product called Packer which we used to control the configuration of the EC2 server and further unify staging and production environments at the operating system and application level of our servers that host our web applications.

Undoubtedly there were challenges whilst adopting this workflow. Many of these were beyond my initial experience barrier. However, the beauty of using a mature tool like Terraform, is that you’re not going alone. There’s a wide community and plenty of help available so issues can be resolved quickly. For me, pain points were concerned with versioning and dependency management of modules (v0.13 improved this with the ability to use count with modules and the soon to be released v0.14 looks to further improve it with a lock file). I also struggled with the lack of best practice for separating environments such as production and staging (Terraform workspaces do not fill this gap but Terragrunt appears to help).


Opportunity



There have been countless lessons so far on our journey and we still have a long way to go. I know I’ve talked a lot about the tech (because that’s just what I love) but if one thing stands out from this process, it’s how the team, and in fact, the whole organisation, dealt with this challenge. We identified an issue, decided how to fix it and committed to getting it done. Not with a patch but with best practice.


We’ve now built the foundations so that when required, we can make the next leap forward in infrastructure. We could move more services into an EC2 auto scaling environment or instead transition into k8s with serverless docker containers. When the time comes to decide, one thing is clear, wherever we are going, we will be going there with code. Everything as Code.


 

Would you like to join us on this intergalactic quest through the universe of tech? If so, you're in luck, we're currently hiring developers in Edinburgh!

Check out our open roles here!

bottom of page