Up to 2019
- We released each customer once a month with a minimum down time of 2 hours (this led us to night releases in order to reduce the business impact).
- Having usually 1 month of development released at once meant we had to check a huge amount of work every release, that means a lot of people involved without proper context on manual testing. Sometimes we had issues with some specific test cases not properly checked.
- The process moreover was largely manual: we updated SQL databases using SQL compare by RedGate, being very careful when updating some tables or indexes because of their size. We updated all code (frontend, backend and back-office) using a mix of PowerShell/cmd scripts with a bit of manual intervention sometimes.
We wanted to improve our delivery framework moving to hot deployments, we understood that giving our customers a continuous deployment process meant a big step forward for them and us.
For this reason, in the first months of 2020, we planned a radical change in how we approach releases.
We started analyzing our release process to understand which critical pieces to fix first.
We ended up defining 3 main issues:
- The way we managed our SQL code
- A single critical backend server
- Release and configuration process mixed
How we resolved these in detail:
- We moved from database projects (in Visual Studio) to migration scripts (executed with badgie-migrator ) for updating our databases; this led us to have a defined version for each database at each release, helping us improve deployment database speed (from hours to seconds) and in reducing post-release issues (having incremental deployment we can deploy and switch on features at different moments because new features are implemented under Feature Flags).
- We had a lot of backend critical services installed on a single server, that meant we couldn’t do hot deployments without disrupting customers. To fix this issue, we replicated these services on a second server, managing their failover in seconds instead of minutes.
- We were also using our release process for keeping our servers and configurations aligned. To speed up the process we introduced a new configuration management tool (chef) that does all the configuration work, leaving the release pipeline to just release work.
The new process
The CI/CD tool we chose is Azure Pipelines. We decided on it because we were already using other tools of the Azure DevOps universe so in order to keep consistency with the tools we were using and we gained some experience with some POC we previously created.
We based our approach on the motto “Build Once, Deploy Many” that made it repeatable and easier to debug.
- step 1 was to create a single build having all the deployable projects grouped onto a single solution.
- step 2 was to create a multistage release pipeline so that every change to be deployed in production needs to first be deployed on test and staging environments. To save production from human errors we also added an approval process for production deployment
Once the new pipeline was complete, for a customer with over 80 servers to manage we were able to move from 5 hours of hot deployment and a failure rate of 1:2 (failure meant for most cases a web server that couldn’t be deployed and that was kept out of balancer, so nothing destructive), to a duration of less than 1 hour with a failure rate of 1:200. So, we improved both resilience and speed of the process. Compared to when we did manual cold deployments, we also greatly improved the number of production releases, meaning a faster SW life cycle and faster response to customer needs. We moved from 1 release per month to approximately 1 release per day as shown in the following graph:
If we add test and staging environments to the count these numbers get higher – an average of 250 releases per month. This is great because developers can test immediately their modification increasing company productivity.
Our goal for the next year in terms of Continuous Deployment is to keep decreasing our production release duration without affecting the newly created resilience and reliability; in order to reach it we are planning to introduce a deployment based on ring.
Our goal is to have releases that take less than 15 minutes. We also want to use this approach with other iSolutions products.