Written 2019-12-28
What methods and techniques can you use to improve cloud margins?
Public cloud providers have ushered in a new era of technological innovation with a “pay for what you use” billing model. This has led to many organizations abandoning their costly and complex data center & hardware management programs in favor of public cloud. In this new cloud-centric landscape, many organizations have struggled to control costs when their employees are empowered to spin up cloud resources on demand, and at their own whim.
Standards must be enacted at the company level to ensure that cloud resources are grouped together correctly, tagged appropriately, and that discount programs are properly employed. In addition, cloud usage should be evaluated regularly, so that the organization can evaluate whether the cloud costs for each project and application are meeting the organization’s desires for profitability, growth, and other strategic needs. Billing alerts can also be leveraged to help with visibility and response to unexpected cost growth.
Collating resources together into appropriate billing groups can help with cost visibility across different departments, teams, and projects. In AWS, the most straightforward way to provide billing grouping is to deploy different resources into different accounts. In GCP, you can employ separate Cloud Billing Accounts, where each GCP Project (containing cloud objects) must be linked to a Cloud Billing Account. In Azure, you can leverage Subscriptions and resource groups to gain optics into the billing for various groups of resources.
Ubiquitous tagging for all cloud objects is also a critical tool for improving visibility into cloud expenditures, and for evaluating that usage. At Pluralsight, I implemented a mandatory tagging program for AWS objects where the following tags are required; environment (such as staging/production), organization (such as Marketing/Sales/Engineering), roles (describing the purpose or app name), team (the owning team), and source (infrastructure as code link). Enforcing compliance with tagging requirements is also critical. At Pluralsight, I built a process that alerts our DevOps engineering team when a cloud object is created that does not include our mandatory tags. With mandatory tagging in place, you can also leverage Cost Allocation Tags to provide billing visibility into resources, based on tag values.
Once an organization’s cloud spend becomes significant, it becomes imperative to leverage discount programs. At Pluralsight, I helped implement a Reserved Instance program that has helped us save over 30% off the retail (on-demand) cost of EC2 and RDS. The Cloud Engineering and DevOps teams evaluate & commit to our RI needs twice per year, resulting in RI purchasing adjustments. AWS Trusted Advisor and AWS Compute Optimizer services can help with right-sizing recommendations. We standardized on particular instance types, such as T3 and M5, to provide us with the flexibility to ensure that we utilize our RI’s across the organization. Now that AWS has launched “Savings Plans”, I recommend utilizing that discount program, as it allows for much greater flexibility in compute usage commitments, with much less internal program management compared to RI’s. We have been able to shift from RI’s to Savings Plans for more savings and greater flexibility.
Once your spend reaches near $1 million annually, you can enter into the AWS Enterprise Discount Program, which can help take another 2% (or more!) off the top of your AWS bill. Other techniques that can help with cloud margins include AWS Spot Instance usage, utilizing an internal PaaS like Kubernetes, and shifting workloads from more expensive services to less costly services.
Controlling public cloud costs for an organization is a complex, multi-pronged initiative that must receive on-going attention. The keys to improving cloud margins are appropriate visibility, continual evaluation & adjustment, and employing discount programs.
How can you enable an engineering team or org to transition to full DevOps principles?
I started at Pluralsight in 2014, when we were a small company with infrastructure located in Viawest, a colocation data center provider. At the time, we had a few product development teams who all worked on a shared monolithic codebase and were highly coupled to one another. During this period, we had a stark separation of duties between Development and Operations personnel. Operations performed the production deploys, were the only folks on-call, and had exclusive access to production environments.
I led the organizational transition to full DevOps principles as one of the first acting DevOps Engineers at the company. All of this early work helped inform our current philosophy and organizational structure of cross functional product development teams. We now use a model where a DevOps Engineer is embedded within a product development team as a full member of that team. We empower our teams by ensuring that their team members possess all of the skill sets that are required to fully manage a software product’s lifecycle, from the developer’s laptop all the way to a production cloud environment.
I have been an advocate and mentor for DevOps principles and culture from my early days at Pluralsight, and my efforts have helped us scale from a few teams in 2014 to the 50+ product development teams and hundreds of applications in production that we have today. I have interviewed, hired, mentored and trained dozens of employees. You can often find me teaching folks about the CALMS (Culture Automation Lean Measurement Sharing) of DevOps and engaging in conversations about culture, architecture, flow, and value creation for our customers.
In an ideal state, how and by who is a cloud service managed?
The team that is most qualified to manage a cloud service is the team that built it. This means that a development team should be on-call, available, and responsible for anything that they ship to production. This same team should also have robust, reliable Continuous Delivery pipelines available to them, which they can execute on their own. This allows for rollbacks and for rolling forward, to resolve problems when they are encountered.
Monitoring, alerting, and observability must exist, and must provide sufficient visibility in order to help the team diagnose and resolve problems. Appropriate logging, log aggregation, dashboarding, and APM (Application Performance Monitoring) are critical in this area. Continuous Delivery pipelines for Infrastructure as Code (IaC) provides powerful capabilities, change management, auditing, and safer production infrastructure access to development teams. This model empowers teams to iterate on their cloud objects and usage.
There are many additional considerations and specific implementation details that are very important to ensure a successful program where a cross functional team manages their own cloud service. Separate teams and their applications should be loosely coupled to one another, as much as possible, to ensure that the actions of one team have a reduced blast radius and cannot have major effects on other teams. Architecture and integration patterns between teams and applications are paramount in such a system.
What skills and experience will you need to lead an organization in their DevOps transformation?
My work at Pluralsight has helped transform a small private company, to a publicly traded Technology giant, with hundreds of engineering employees that embrace a DevOps culture.
To ensure a successful DevOps Organizational transformation, the unique needs of an organization must be weighed. The tactics employed will vary based on the company’s current culture, history, organizational structure, practices, architecture, technology footprint, and other variables. One of the most important factors of a journey like this is Leadership Buy-in. Before proposing specific plans that will help a company succeed on its DevOps journey, many of the variables mentioned above must be considered.