The Frugal Architecture: Cloud Cost Efficiency in Practice

polishchuk 0 386 23.12.2024 12 min read en

Recently, I’ve been a speaker at the FW Days Architecture Meetup in Warsaw, where I talked about Frugal architecture in practice.

The frugal architecture in practice by Artem Polishchuk

In this post, I collect what was covered during the meetup about frugality in practice, the main tips, and how to avoid the pitfalls. The presentation from Meetup can be found by the link. Or just read this post 😊

Resume-Driven Development Trap

Sometimes, developers pick trendy technologies just because they look good on a resume, not because they’re the best fit for the project. This practice can hurt long-term project sustainability by increasing maintenance costs and adding unnecessary complexity for the team. It's called resume-driven development. While it might sound harmless, choosing tools and frameworks based on popularity instead of real needs can lead to higher costs and unnecessary complexity.

Example:

A team might choose microservices for a small application when a monolithic approach might work better and cost less. The result? Higher hosting costs, more complicated operations, and wasted resources.

To avoid this, focus on the specific needs of your project and weigh the long-term costs of every decision.

What is The Frugal Architecture?

The Frugal Architecture is about delivering maximum value with minimal cost. Dr. Werner Vogels, Amazon's CTO, proposed this concept, which is embedded in the AWS Well-Architected Framework.

3 phases of Frugal architecture contain 7 laws:

Design
- Make Cost a Non-functional Requirement.
- Systems that Last Align Cost to Business.
- Architecting is a Series of Trade-offs.
Measure
- Unobserved Systems Lead to Unknown Costs.
- Cost-Aware Architectures Implement Cost Controls.
Observe
- Cost Optimization is Incremental.
- Unchallenged Success Leads to Assumptions.

Let’s review all the phases and laws and understand their importance in creating cost-efficient, scalable, and sustainable systems.

Phase 1: 🎨 Design

Law 1: Make Cost a Non-functional Requirement

To ensure a system is designed, developed, and operated within budget. You should consider cost as a non-functional requirement (NFR)

non-functional requirements list — To learn more about Non-functional requirements, please read our article: Best Practices for Effective Software Architecture Documentation and how to operate with them.

🎯 Action:

Take into account cost limitations when designing the architecture. Prioritize these limitations against other requirements like scalability and performance to ensure a balanced solution.

Law 2: Systems that Last Align Cost to Business

Design systems that grow with the business and keep expenses under control to avoid problems with growth

🎯 Action:

Be aware of how the Business calculates the revenue.
Your architecture documents should include cost indications.

For example, a simplified revenue formula might look like this:

Revenue = subscription cost - (Infrastructure cost / user count)

Example of how to use the first 2 laws in practice: Big picture diagram documentation

Below is an example from one of my recent projects that covers what we have seen in the first two laws.

The key point is that we have to include in the diagram description the cost indication for each element and the total cost:

#	Name of module	Description	Approximate cost for MVP per month	Cost for Post MVP	Worst case scenario
1	Physical devices	IoT device and Wi-Fi gateway that is used for external communication with IoT devices: Configuration, device status, etc.	Not in scope of our product
2	Event Hub	Inbound queue that is used to store device events.	Not in scope of our product	Not in scope of our product	Not in scope of our product
3	Azure Function	Function that subscribe on Event hub messages, then store them in DB & send to outbound queue	$0 (consumption plan)	$291.85 for 3.5gb RAM premium instance and one additional instance for scaling	$291.85 for 3.5gb RAM premium instance and one additional instance for scaling
4	Redis Cache	Cache for store data that query frequent by system	$16.06 (Basic C0)	$163 C2	For geo-replicated ~$400 per premium instance
5	Azure Cosmos DB	Primary database	$25.86 (400 Request units)	$275.15	$12300
6	Azure Web app	Web interface to manage gateways/devices and browse logs	$73.00 (S1)	$50 for frontdoor and $73 (s1) * 4 instances	$100 for frontdoor and $73 (s1) * 4 instances
7	Azure B2C	Customer identity access management (CIAM)	Free (for first 50k monthly active users)	Free (for first 50k monthly active users)	Free (for first 50k monthly active users)
8	Azure Service bus	Outbound queue that clients use to receive device events.	$10 (Standard tier, First 13M ops/month free)	$677.08	$677.08 * 4
9	Azure Blobs	Used for storing device config files	$1	$10	$10
10	Customer cloud infrastructure	External infrastructure to receive messages from IoT	Not in scope of our product
	Total		$125.92	~ $1 760	~ $16 900

Law 3: Architecting is a Series of Trade-offs

Frugality is about maximizing value, not just minimizing spend. To do that, you need to determine what you’re ready to pay for.

🎯 Action:

Include the cost of your trade-off analysis as part of the Architecture Decision Record (ADR)

Take a look at our article "Best Practices for Effective Software Architecture Documentation" to find ADR templates, learn what they are, and how to use them in Architecture Documentation.

Example of how to use Law 3 in practice:

An example of ADR where approximate cost indication was included into description.

An example of ADR where part of the trade-off analysis included the cost calculation for several scenarios to show how much the MongoDB options will cost as it proposes good throughput but also has a high price.

Cloud price calculators

You can use Price Calculators for Clouds to calculate infrastructure costs

Phase 2: 📏 Measure

Law 4: Unobserved Systems Lead to Unknown Costs

“If you can’t measure it, you can’t manage it.” Use tools to monitor costs and utilization of resources.

🎯 Action:

Set up dashboards to review costs continually: Dashboards provide a clear view of your resource usage and spending patterns, helping you identify inefficiencies and optimize costs effectively. For example:

Examples of tools that can be used to measure cost

Use the dashboards to review the cost continually:

Amazon web services

AWS CUDOS (Cost and Usage Dashboards Operations Solution) example

Azure

Google Cloud Platform

Custom Dashboards

Custom Dashboard for monitor cost — As an example you can use custom dashboards such as NewRelic dashboard or similar

Law 5: Cost-Aware Architectures Implement Cost Controls

Evaluate your system components by criticality.

🎯 Action:

Cost optimization must be measurable and tied (like tier 1, tier 2, and tier N) to business impact.

Example: Split the E-commerce system into tiers

Let's consider an e-commerce system that was split into tiers based on sub-system criticality

Tired cost optimization for architecture — Example of e-commerce system that was split by tiers

Categorized components into tiers:

Tier 1: Core components, scale regardless of cost.
Tier 2: Important components, can scale down temporarily.
Tier 3: Nice-to-have, keep cheap and simple.

Phase 3: Observe 👀

Law 6: Cost Optimization is Incremental

Making sure your system is cost-effective is an ongoing process. It’s not something you do once and then forget about. Conduct these reviews regularly, such as quarterly or after major deployments, to ensure efficiency and scalability.

🎯 Action:

Regularly check your system to find ways to improve efficiency.

Law 7: Unchallenged Success Leads to Assumptions

Don’t assume a solution that worked in the past is still the best choice. Regularly challenge your assumptions.

🎯 Action:

Review the relevance and cost-effectiveness of your technologies. Explore new tools and frameworks that might offer better performance or lower costs.

DevSecOps Tools Periodic Table

An example and reference for new tools might be the DevSecOps Tools Periodic Table:

Common Pitfalls

Pitfall 1: Ignoring Database Growth

As databases grow over time, unchecked expansion can lead to escalating costs and reduced performance, affecting the overall efficiency of your system.

❓ How to avoid:

Move old, unused data to cheaper solutions (e.g., cold storage).
Regularly optimize schemas and indexes.
Track database size trends.

Pitfall 2: Inefficient Use of IO-Bound Operations

Modern systems often use a thread pool to manage these tasks efficiently. A thread pool is a collection of worker threads that can be reused for different tasks, reducing the overhead of creating and destroying threads. This helps systems handle more requests simultaneously without wasting resources.

If the engineer uses the IO-bound operations and calls them synchronously, we block the thread from the thread pool and this thread is not accessible for processing other HTTP requests.

What are I/O bound operations?

Reading or writing large files to disk.
Accessing a database for queries.
Communicating with external APIs over the network.

So, for example, if all threads from the pool are 'busy' waiting until I/O operations end, the system might slow down, and the load balancer will detect that. Then LB will add more instances for our system to process the ongoing requests:

Simplified schema of scaling instances by load balancer

That also increases cost, as we will pay for each instance (if we do not use VMs or dedicated services).

❓ How to avoid:

Use non-blocking or asynchronous IO mechanisms (e.g., event loops in Node.js, async/await in modern languages).

Pitfall 3: Over-Provisioning Resources

Provisioning more computing, storage, or RAM than necessary wastes money without delivering proportional benefits.
The creation of unnecessary environments increases infrastructure costs.

❓ How to avoid:

Implement right-sizing strategies and auto-scaling policies. Review resource utilization metrics and do the load testing to detect optimal configurations for environments. Combine environments on the same service plan where possible (like dev/QA environments)

Pitfall 4: Lack of Team Ownership

Teams without clear ownership of cost and performance often make decisions that might not be cost-efficient.

❓ How to avoid:

Each team member should be aware of how much we pay for infrastructure.
Assign a responsible person
Apply FinOps principles to make cost-efficiency an ongoing focus.

What are FinOps principles?

You can find them as part of the FinOps Framework:

Pitfall 5: Ignoring Technical Debt

Allowing technical debt to accumulate reduces your team's ability to adapt and increases the cost of implementing future changes. Ignoring technical debt can lead to inefficiencies, slow development cycles, and a higher risk of critical issues in the long term.

❓ How to avoid:

Dedicate time each sprint to address technical debt incrementally.
Use static code analysis and regular architectural reviews to identify and resolve problematic areas.
Create a clear plan for reducing technical debt and prioritize fixes based on business impact.

Pitfall 6: Over-Engineering Solutions

Consider complex architectures (e.g., microservices or EDA for small-scale apps,) that add unnecessary overhead regarding development, maintenance, and runtime costs.

❓ How to avoid:

Start simple with monoliths or modular monoliths, scaling into distributed systems when reached by specific scaling or team requirements.
DDD approach, Vertical Slice Architecture could be a good choice for monolith to prepare the system for future splitting.

Example: Monolith vs Microservices - Is Monolith Frugal?

Usually, you can hear that monolith is cheaper than microservice. Let's calculate it's true or not

Monolith vs Microservices: Cost calculation

Baseline

Use Azure App Service from West Europe with the Windows Operating system
Assume for simplicity that App Service S1 (1 core, 1.75 GB RAM, and 50 GB store) has throughput 1 000 requests
Assume for simplicity that App Service S3 (4 cores, 7 GB RAM, and 50 GB storage) has a throughput of 4,000 requests
Database & rest out of scope for current estimation

App service plan	OS	Price per month	Used for
S1 (1 core, 1.75 GB ram and 50 GB store)	Windows	$73	Microservice
S3 (4 cores, 7gb ram, 50 GB store)	Windows	$292	Monolith

Monolith vs Microservices: Ramp up

As you see from the calculation, the microservices cost is almost the same as the monolith, indicating that you do not need to pay more to host the services.

Monolith vs Microservices: Limitations

If you check the limits of the Azure app service, you will find that 10 instances (max) are allowed for the Standard tier.

That means:

To scale beyond 10 instances, you should move to a Premium / Isolated plan, which significantly increases the monolith's costs.
With microservices, individual services can be scaled independently, helping you reduce costs by only scaling the most critical services (e.g., consider premium for Orders service and Standard for rest)

Monolith vs Microservices: Final Thoughts

So, microservice is cheaper than monolith?

So in conclusion, could we say microservices are much cheaper than monoliths? Of course not, because for microservices, we have 'hidden' costs:

Hidden Extra Costs with Microservices

Infrastructure: API Gateway, Message Broker, and other components.
Development: Higher complexity and needs of skilled engineers.
Increased DevOps effort and operational costs.

Best Fit:

Microservices are well-suited for managing complexity and supporting the needs of rapidly growing systems.
Due to reduced complexity, monolithic architectures may offer better cost efficiency for simpler applications.

Conclusion

Align architecture with business needs and technical constraints to ensure long-term adaptability and sustainability.
Make cost a non-functional requirement and include it in trade-off analyses to better align decisions with the budget.
Continuously monitor and optimize costs by leveraging tools and regular reviews to identify inefficiencies and savings opportunities.
Design systems with scalability to ensure they grow efficiently without unnecessary spending.
Proactively identify and address potential pitfalls, such as technical debt, over-provisioning, and ignoring database growth, which can lead to escalating costs and reduced performance.
By embracing Frugal Architecture, you can create systems that are high-performing but also cost-effective, scalable, and sustainable for the future.

ADR AWS Azure Budget DDD Documentation Event-Driven Architecture GCP Load tests Microservices NFR Performance Tools

Comments:

Please log in to be able add comments.