The Frugal Architecture: Cloud Cost Efficiency in Practice
Recently, I’ve been a speaker at the FW Days Architecture Meetup in Warsaw, where I talked about Frugal architecture in practice.
In this post, I collect what was covered during the meetup about frugality in practice, the main tips, and how to avoid the pitfalls. The presentation from Meetup can be found by the link. Or just read this post 😊
Resume-Driven Development trap
Sometimes developers pick trendy technologies just because they have a good look on a resume, not because they’re the best fit for the project. This practice can hurt long-term project sustainability by increasing maintenance costs and adding unnecessary complexity for the team. This is called resume-driven development. While it might sound harmless, choosing tools and frameworks based on popularity instead of real needs can lead to higher costs and unnecessary complexity.
Example:
A team might choose microservices for a small application when a monolithic approach might work better and cost less. The result? Higher hosting costs, more complicated operations, and wasted resources.
To avoid this, focus on the specific needs of your project and weigh the long-term costs of every decision.
What is The Frugal Architecture?
The Frugal Architecture is about delivering maximum value with minimal cost. This concept was proposed by Dr. Werner Vogels, Amazon's CTO, and is embedded as part of AWS Well-Architected Framework.
3 phases of Frugal architecture contains 7 laws:
- Design
- Make Cost a Non-functional Requirement.
- Systems that Last Align Cost to Business.
- Architecting is a Series of Trade-offs.
- Measure
- Unobserved Systems Lead to Unknown Costs.
- Cost-Aware Architectures Implement Cost Controls.
- Observe
- Cost Optimization is Incremental.
- Unchallenged Success Leads to Assumptions.
Let’s review all the phases and laws and understand their importance in creating cost-efficient, scalable, and sustainable systems.
Phase 1: 🎨 Design
Law 1: Make Cost a Non-functional Requirement
To ensure a system is designed, developed, and operated within budget. You should consider cost as a non-functional requirement (NFR)
🎯 Action:
- Take into account cost limitations when designing the architecture. Prioritize these limitations against other requirements like scalability and performance to ensure a balanced solution.
Law 2: Systems that Last Align Cost to Business
Design systems that grow with the business and keep expenses under control to avoid problems with growth
🎯 Action:
- Be aware of how the Business calculates the revenue.
- Your architecture documents should include cost indications.
For example, a simplified revenue formula might look like this:
Revenue = subscription cost - (Infrastructure cost / user count)
Example of how to use the first 2 laws in practice: Big picture diagram documentation
Below is an example from one of my recent projects that covers what we have seen in the first two laws.
The key point is that we have to include for diagram description the cost indication for each element and the total cost:
# | Name of module | Description | Approximate cost for MVP per month | Cost for Post MVP | Worst case scenario |
---|---|---|---|---|---|
1 | Physical devices | IoT device and Wi-Fi gateway that is used for external communication with IoT devices: Configuration, device status, etc. | Not in scope of our product | ||
2 | Event Hub | Inbound queue that is used to store device events. | Not in scope of our product | Not in scope of our product | Not in scope of our product |
3 | Azure Function | Function that subscribe on Event hub messages, then store them in DB & send to outbound queue | $0 (consumption plan) | $291.85 for 3.5gb RAM premium instance and one additional instance for scaling | $291.85 for 3.5gb RAM premium instance and one additional instance for scaling |
4 | Redis Cache | Cache for store data that query frequent by system | $16.06 (Basic C0) | $163 C2 | For geo-replicated ~$400 per premium instance |
5 | Azure Cosmos DB | Primary database | $25.86 (400 Request units) | $275.15 | $12300 |
6 | Azure Web app | Web interface to manage gateways/devices and browse logs | $73.00 (S1) | $50 for frontdoor and $73 (s1) * 4 instances | $100 for frontdoor and $73 (s1) * 4 instances |
7 | Azure B2C | Customer identity access management (CIAM) | Free (for first 50k monthly active users) | Free (for first 50k monthly active users) | Free (for first 50k monthly active users) |
8 | Azure Service bus | Outbound queue that clients use to receive device events. | $10 (Standard tier, First 13M ops/month free) | $677.08 | $677.08 * 4 |
9 | Azure Blobs | Used for storing device config files | $1 | $10 | $10 |
10 | Customer cloud infrastructure | External infrastructure to receive messages from IoT | Not in scope of our product | ||
Total | $125.92 | ~ $1 760 | ~ $16 900 |
Law 3: Architecting is a Series of Trade-offs
Frugality is about maximizing value, not just minimizing spend. To do that, you need to determine what you’re ready to pay for.
🎯 Action:
- Include the cost of your trade-off analysis as part of the Architecture Decision Record (ADR)
Take a look at our article: Best Practices for Effective Software Architecture Documentation to find ADR templates and learn what it is and how to use them in Architecture Documentation.
Example of how to use Law 3 in practice:
Cloud price calculators
You can use Price Calculators for Clouds to calculate infrastructure costs
Phase 2: 📏 Measure
Law 4: Unobserved Systems Lead to Unknown Costs
“If you can’t measure it, you can’t manage it.” Use tools to monitor costs and utilization of resources.
🎯 Action:
- Set up dashboards to review costs continually: Dashboards provide a clear view of your resource usage and spending patterns, helping you identify inefficiencies and optimize costs effectively. For example:
Examples of tools that can be used to measure cost
Use the dashboards to review the cost continually:
Amazon web services
Azure
Google Cloud Platform
Custom Dashboards
Law 5: Cost-Aware Architectures Implement Cost Controls
Evaluate your system components by criticality.
🎯 Action:
- Cost optimization must be measurable and tied (like tier 1, tier 2, and tier N) to business impact.
Example: Split the E-commerce system into tiers
Let's consider an e-commerce system that was split into tiers based on sub-system criticality
Categorized components into tiers:
- Tier 1: Core components, scale regardless of cost.
- Tier 2: Important components, can scale down temporarily.
- Tier 3: Nice-to-have, keep cheap and simple.
Phase 3: Observe 👀
Law 6: Cost Optimization is Incremental
Making sure your system is cost-effective is an ongoing process. It’s not something you do once and then forget about. Conduct these reviews regularly, such as quarterly or after major deployments, to ensure efficiency and scalability.
🎯 Action:
- Regularly check your system to find ways to improve efficiency.
Law 7: Unchallenged Success Leads to Assumptions
Don’t assume a solution that worked in the past is still the best choice. Regularly challenge your assumptions.
🎯 Action:
- Review the relevance and cost-effectiveness of your technologies. Explore new tools and frameworks that might offer better performance or lower costs.
DevSecOps Tools Periodic Table
An example and reference for new tools might be the DevSecOps Tools Periodic Table:
Common Pitfalls
Pitfall 1: Ignoring Database Growth
As databases grow over time, unchecked expansion can lead to escalating costs and reduced performance, affecting the overall efficiency of your system.
❓ How to avoid:
- Move old, unused data to cheaper solutions (e.g., cold storage).
- Regularly optimize schemas and indexes.
- Track database size trends.
Pitfall 2: Inefficient Use of IO-Bound Operations
Modern systems often use a thread pool to manage these tasks efficiently. A thread pool is a collection of worker threads that can be reused for different tasks, reducing the overhead of creating and destroying threads. This helps systems handle more requests simultaneously without wasting resources.
If the engineer uses the IO-bound operations and calls them synchronously, we block the thread from the thread pool and this thread is not accessible for processing other HTTP requests.
What are I/O bound operations?
- Reading or writing large files to disk.
- Accessing a database for queries.
- Communicating with external APIs over the network.
So, for example, if all threads from the pool are 'busy' waiting until I/O operations end, the system might slow down, and the load balancer will detect that. Then LB will add more instances for our system to process the ongoing requests:
That also increases cost, as we will pay for each instance (if we do not use VMs or dedicated services).
❓ How to avoid:
Use non-blocking or asynchronous IO mechanisms (e.g., event loops in Node.js, async/await in modern languages).
Pitfall 3: Over-Provisioning Resources
- Provisioning more computing, storage, or RAM than necessary wastes money without delivering proportional benefits.
- The creation of unnecessary environments increases infrastructure costs.
❓ How to avoid:
Implement right-sizing strategies and auto-scaling policies. Review resource utilization metrics and do the load testing to detect optimal configurations for environments. Combine environments on the same service plan where possible (like dev/QA environments)
Pitfall 4: Lack of Team Ownership
Teams without clear ownership of cost and performance often make decisions that might not be cost-efficient.
❓ How to avoid:
- Each team member should be aware of how much we pay for infrastructure.
- Assign responsible person
- Apply FinOps principles to make cost-efficiency an ongoing focus.
What are FinOps principles?
You can find them as part of the FinOps Framework:
Pitfall 5: Ignoring Technical Debt
Allowing technical debt to accumulate reduces your team's ability to adapt and increases the cost of implementing future changes. Ignoring technical debt can lead to inefficiencies, slow development cycles, and a higher risk of critical issues in the long term.
❓ How to avoid:
- Dedicate time each sprint to address technical debt incrementally.
- Use static code analysis and regular architectural reviews to identify and resolve problematic areas.
- Create a clear plan for reducing technical debt and prioritize fixes based on business impact.
Pitfall 6: Over-Engineering Solutions
Consider complex architectures (e.g., microservices or EDA for small-scale apps,) that add unnecessary overhead regarding development, maintenance, and runtime costs.
❓ How to avoid:
- Start simple with monoliths or modular monoliths, scaling into distributed systems when reached by specific scaling or team requirements.
- DDD approach, Vertical Slice Architecture could be a good choice for monolith to prepare the system for future splitting.
Example: Monolith vs Microservices - Is Monolith Frugal?
Usually, you can hear that monolith is cheaper than microservice. Let's calculate it's true or not
Monolith vs Microservices: Cost calculation
Baseline
- Use Azure App Service from West Europe with the Windows Operating system
- Assume for simplicity that App Service S1 (1 core, 1.75 GB RAM, and 50 GB store) has throughput 1 000 requests
- Assume for simplicity that App Service S3 (4 cores, 7 GB RAM, and 50 GB store) has a throughput 4 000 requests
- Database & rest out of scope for current estimation
App service plan | OS | Price per month | Used for |
---|---|---|---|
S1 (1 core, 1.75 GB ram and 50 GB store) | Windows | $73 | Microservice |
S3 (4 cores, 7gb ram, 50 GB store) | Windows | $292 | Monolith |
Monolith vs Microservices: Ramp up
Monolith vs Microservices: Limitations
If you check the limits of the Azure app service, you will find that 10 instances (max) are allowed for the Standard tier.
That means:
- To scale beyond 10 instances, you should move to a Premium / Isolated plan, which significantly increases costs for the monolith.
- With microservices, individual services can be scaled independently, helping you reduce costs by only scaling the most critical services (e.g., consider premium for Orders service and Standard for rest)
Monolith vs Microservices: Final Thoughts
So in conclusion, could we say that microservices are much cheaper than monoliths? Of course not, because for microservices we have 'hidden' costs:
Hidden Extra Costs with Microservices
- Infrastructure: API Gateway, Message Broker, and other components.
- Development: Higher complexity and needs of skilled engineers.
- Increased DevOps effort and operational costs.
Best Fit:
- Microservices are well-suited for managing complexity and supporting the needs of rapidly growing systems.
- Monolithic architectures may offer better cost efficiency for simpler applications due to reduced complexity.
Conclusion
- Align architecture with business needs and technical constraints to ensure long-term adaptability and sustainability.
- Make cost a non-functional requirement and include it in trade-off analyses to better align decisions with the budget.
- Continuously monitor and optimize costs by leveraging tools and regular reviews to identify inefficiencies and savings opportunities.
- Design systems with scalability in mind to ensure they grow efficiently without unnecessary spending.
- Proactively identify and address potential pitfalls like technical debt, over-provisioning, and ignoring database growth, which can lead to escalating costs and reduced performance.
- By embracing Frugal Architecture, you can create systems that are not only high-performing but also cost-effective, scalable, and sustainable for the future.