FinOps / Cloud Finance Cheat Sheet

FinOps Framework

What is FinOps?

FinOps (Financial Operations) is an operating model for cloud financial management. It's not just about cutting costs—it's about maximizing value from cloud investment through collaboration between finance, engineering, and business teams.

                Core Principle: Cost is a first-class engineering metric, just like performance, availability, and security. Engineers should be aware of the cost impact of architectural decisions.
            

The FinOps Lifecycle (Inform → Optimize → Operate)

Inform Phase

Build cost visibility and understanding

Cost allocation (which team/product owns which costs)
Benchmarking (how do we compare to peers?)
Forecasting (what will we spend next quarter?)
Chargeback reporting (showback/chargeback models)

Optimize Phase

Identify and implement savings

Rate optimization (RIs, Savings Plans, Spot)
Usage optimization (right-sizing, auto-scaling)
Commitment-based discounts (planning ahead)
Waste elimination (idle resources, orphaned volumes)

Operate Phase

Sustain and continuously improve

Cost anomaly detection (alert on unusual spending)
Continuous optimization (rolling process)
Team engagement (teach engineers to optimize)
Policy governance (tagging, resource limits)

Key Difference

FinOps is a CYCLE, not a one-time project. You continuously loop: inform → optimize → operate → inform (with new cost data) → optimize (more) → operate...

FinOps Personas

FinOps Practitioner (Cloud Finance Engineer)

Coordinator between teams. Owns cost strategy, forecasting, reporting.

Background: Finance or engineering
Tools: Cost Explorer, CloudHealth, SQL
Goal: Visibility, forecasting, cost culture

Engineering Architect/Team Lead

Designs infrastructure. Makes decisions that impact cost.

Goal: Right-sizing, efficient design, Spot utilization
Motivation: Cost awareness, performance, reliability
Challenge: Cost trade-offs (is it worth 2x cost for 10% more reliability?)

Finance Team

Accounting, procurement, budgeting. Interfaces with FinOps practitioner.

Goal: Accurate forecasting, budget management, cost control
Challenge: Cloud variable cost model is new (not like on-prem fixed costs)

Business/Executive

Product owners, CTOs. Strategic decision-makers.

Goal: Cost per unit (cost per user, revenue per cost dollar)
Decision: Is this cloud? Should we shift on-prem to cloud?

FinOps Maturity Levels

Crawl (Ad-hoc)

Early stage. Reactive cost management.

Limited visibility. Cost surprises happen.
Basic tagging (if any)
No forecasting or budgeting
One person owns "cloud costs"

Walk (Standard)

Growing cloud usage. Proactive management starting.

Good cost visibility via tagging
Allocated costs to teams/products
Basic unit economics (cost per user)
Some RI/SP utilization
Monthly cost reviews

Run (Advanced)

Mature FinOps. Fully optimized, integrated into culture.

Real-time cost visibility
Fully allocated costs (100% of cloud bill assigned)
Accurate forecasting (within ±5%)
Automated optimization (auto-scaling, Spot management)
Cost embedded in architecture decisions
>90% RI/SP utilization
<5% waste

Cloud Pricing Models

On-Demand Pricing

What It Is

Pay per hour (or second in modern clouds). No commitment. Maximum flexibility.

Cost: Highest. Baseline pricing.

Use for: Dev/test, unpredictable workloads, short-lived jobs, spiky traffic

Example: AWS t3.medium instance = ~$0.04/hour

Reserved Instances (RIs) & Savings Plans

Reserved Instance (AWS)

Commit to 1-3 year term for specific instance type/region

30-75% off

Standard RI: Fixed instance type. 75% savings but inflexible.
Convertible RI: Can exchange for different family. 66% savings.
Scheduled RI: For predictable times only (e.g., business hours)

Payment options:

All Upfront (max savings)
Partial Upfront (middle)
No Upfront (minimum savings)

Savings Plans (AWS)

Commit to $/hour spend (not instance type). More flexible than RIs.

66-72% off

Compute Savings Plan: 66% off. Any compute (EC2, Fargate, Lambda)
EC2 Savings Plan: 72% off. Any EC2 in family+region
SageMaker Plan: For ML workloads

Advantage: Flexibility. Switch instance types without penalty.

Spot Instances (AWS) / Preemptible VMs (GCP)

What It Is

Use spare cloud capacity at huge discount. Trade reliability for savings.

Cost: 50-90% off on-demand. Price varies by demand (auctions).

Catch: Can be interrupted with 2-minute (AWS) or 30-second (GCP) notice

Use for: Fault-tolerant batch jobs, CI/CD, Spark/Hadoop, stateless services, data processing

NOT for: Stateful services, databases, critical production services requiring 99.9% uptime

Best practice: Mix 70% Spot + 30% on-demand for resilience. Target platform (Spot interruption is random across zones).

Sustained Use Discounts (GCP)

Automatic Discounts

GCP gives automatic discounts (no commitment) when using instances >25% of month

25-50% usage: 20% discount
50-75% usage: 30% discount
75%+ usage: 37% discount

Advantage: No commitment risk. Discount applies automatically.

Committed Use Discounts (GCP CUD)

GCP CUD (1-3 year commitment)

Similar to AWS RIs but more flexible

1-year: 25-37% off
3-year: 52-70% off
Can use across instance families/regions (more flexible than AWS RIs)

Pricing Comparison Table

Model	Discount	Flexibility	Use Case
On-Demand	0% (baseline)	Maximum	Dev/test, unpredictable
Reserved Instance	30-75%	Low (specific type)	Stable production workloads
Savings Plans	66-72%	High (across types)	Predictable, diverse compute
Spot/Preemptible	50-90%	Very Low (can interrupt)	Fault-tolerant batch jobs
Sustained Use	20-37%	Maximum (no commitment)	Stable workloads (GCP)

AWS Cost Optimization — Detailed

EC2 Optimization

Right-Sizing

Use CloudWatch metrics to identify oversized instances

Look at CPU utilization: If consistently <20%, downsize
Look at memory utilization: If consistently <30%, downsize
AWS Compute Optimizer provides recommendations automatically

Savings potential: 20-40% per instance

Graviton Instances

AWS-designed ARM-based processors. 20% cheaper, 40% better performance than x86

Works with: EC2, RDS, ElastiCache, etc.
Limitation: Need to rebuild container images for ARM

Spot Instances

For fault-tolerant workloads

Use Spot Fleet to manage mix of instance types
Capacity-optimized allocation (spreads across pools, reduces interruptions)
Use on/off schedule for dev/test (shutdown after hours)

Storage Optimization (S3)

Lifecycle Policies

Automatically move objects to cheaper tiers based on age

-- S3 Lifecycle Example
Age: 0-30 days → Standard ($0.023/GB)
Age: 30-90 days → Infrequent Access ($0.0125/GB) [50% savings]
Age: 90+ days → Glacier ($0.004/GB) [80% savings]
Age: 365+ days → Delete

-- Result: 30-day avg cost = $0.01/GB (vs $0.023 if all Standard)
                

Storage Classes

S3 Standard: Hot data, frequent access
S3 Intelligent-Tiering: Unknown access patterns. Moves data automatically.
S3 Infrequent Access: Accessed < monthly. Retrieval fees apply.
S3 Glacier: Archive. Expensive to retrieve. Use for compliance backups.
S3 Glacier Deep Archive: $1/TB/month storage. $20+ retrieval fee.

Data Transfer Costs

Often overlooked! Egress is expensive (2-9¢/GB depending on destination)

Use CloudFront CDN to reduce egress (caches at edge)
Keep data in same region when possible
Use VPC endpoints to avoid NAT Gateway charges ($0.045/hour + $0.045/GB)

Database Optimization (RDS)

Instance Right-Sizing

Use CloudWatch CPU/memory metrics
Downsize from db.r6i.4xlarge to db.r6i.2xlarge can save 50%

Reserved Instances for Production

RDS 1-3 year RIs give 40-50% savings for stable workloads

Aurora Serverless v2

Pay per minute of DB compute used. No idle charges. Perfect for variable workloads.

Scales 0.5 ACU - 128 ACU automatically
Savings: 70% vs fixed instance if workload is spiky

Read Replicas

Reduce load on primary. Cross-region replicas for disaster recovery.

Multi-AZ Warning

Doubles cost for synchronous replica. Only use for production critical DBs.

Lambda Optimization

Memory = CPU

Lambda pricing: $0.0000166667 per GB-second

More memory = faster execution = lower cost (sometimes)

512 MB × 10 seconds = 5120 GB-sec × $0.0000166667 = $0.085
1024 MB × 5 seconds = 5120 GB-sec × $0.0000166667 = $0.085 (same cost, 2x faster!)

Strategy: Use AWS Lambda Power Tuning tool to find optimal memory for each function

Provisioned Concurrency

Keep warm instances running. Cost: $0.015/hour per concurrent execution

Use ONLY for latency-sensitive functions. Otherwise, cold starts are acceptable.

Reserved Concurrency

Limits max concurrency to prevent runaway costs from bugs

Cost Anomaly Detection

AWS Cost Anomaly Detection

ML-based monitoring. Alerts when spending is unusual.

Learns baseline spending
Detects anomalies (e.g., 20% jump in EC2)
Set frequency: daily/weekly alerts

Budget Alerts

Fixed threshold: "Alert if June spend > $50K"

CloudWatch Billing Alarms

Estimated charges. Set alarm at 80% of monthly budget.

Tagging Strategy for Cost Allocation

Mandatory Tags (Enforce via AWS Config)

-- Cost allocation tags
Environment: production | staging | dev
Team: backend | frontend | data | devops
Project: project-name
CostCenter: cc-123
Owner: firstname.lastname@company.com
Application: service-name
Service: compute | storage | database
                

Tag Compliance

Use AWS Config rules to enforce tags on all new resources
Provide automated remediation (tag untagged resources)
Monthly audit of tag compliance
Block untagged resources from deployment (via IAM policy)

Activate Cost Allocation Tags

Tags must be "activated" in Billing console to use for cost allocation

Then use in Cost Explorer, CUR, and forecasting

GCP Cost Optimization

BigQuery Cost Control

Pay Per Query (Most Common)

Pay per byte scanned. $6.25 per TB (first 1 TB/month free)

Optimize: Partition & cluster tables to scan fewer bytes
Example: Query 100 GB table but only scan 10 GB partition = $0.0625 cost

Partitioning (Critical!)

CREATE TABLE events
PARTITION BY DATE(event_timestamp)
AS SELECT ...;

-- Query only July 1 partition (1GB instead of 30GB)
SELECT * FROM events
WHERE event_timestamp >= '2024-07-01'
  AND event_timestamp < '2024-07-02';
-- Cost: $0.006 instead of $0.19
                

Clustering

Organize data within partitions by column (e.g., user_id)

Further reduces bytes scanned for WHERE clauses on cluster key

Slot Reservations

Flat-rate pricing for predictable workloads

Annual: $40K/slot/year (~1 TB query/month included)
Flex slots: $4/slot/hour (pay as you go, cheaper than per-query for consistent load)

BI Engine Caching

In-memory cache for dashboard queries

$0.069/GB/month. Often breaks even after 1-2 weeks of repeated dashboards.

Query Cost Audit

SELECT
  user_email,
  DATE(creation_time) as query_date,
  SUM(total_bytes_processed) / POW(10, 12) as tb_scanned,
  SUM(total_bytes_processed) / POW(10, 12) * 6.25 as estimated_cost_usd
FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY user_email, query_date
ORDER BY estimated_cost_usd DESC;
                

Compute Engine Right-Sizing

Sustained use discounts (automatic, no commitment)
Committed use discounts (1-3 year, 52-70% savings)
Preemptible VMs (50-80% off for interruptible workloads)
Custom machine types (pay for only CPU/memory you use)

Cloud Storage Lifecycle

GCP Storage Classes

Standard: $0.020/GB. Hot data.
Nearline: $0.010/GB. Access < 1x/month. 30-day minimum.
Coldline: $0.004/GB. Access < 1x/quarter. 90-day minimum.
Archive: $0.0012/GB. Access < 1x/year. 365-day minimum, $0.05 retrieval

Lifecycle Policy

-- Automatic transition
0-90 days: Standard
90-180 days: Nearline
180+ days: Archive
                

Azure Cost Management

Reserved VM Instances

1-3 year commitment. Up to 72% savings.

Payment upfront maximizes savings
Can exchange for different size in same family

Azure Hybrid Benefit

Bring existing Windows Server / SQL Server licenses

Save up to 40% on compute
Must have Software Assurance on licenses

Azure Spot VMs

Up to 90% discount for interruptible workloads

Azure Reservations

For predictable services: Cosmos DB, SQL DB, Blob Storage, App Service

1-3 year commitments with 20-40% savings

Cost Management + Billing

Budget alerts
Cost analysis by resource group, subscription, tag
Advisor recommendations (rightsizing, idle resources)
Cost estimation (before deploying)

FinOps Tools & Practices

AWS Tools

AWS Cost Explorer

Dashboard for cost visibility

Filter by service, region, tag, linked account
View daily/monthly trends
Forecasting (ML-based)
RI/SP utilization dashboard
Right-sizing recommendations

AWS Cost and Usage Report (CUR)

Granular billing data (>90 columns per line item)

Export to S3 → Analyze in Athena or QuickSight

Per-resource costs
RI/SP amortization
Savings Plans utilization
Unblended vs amortized cost

Infracost

CLI tool: Estimate Terraform changes

-- Show cost delta before apply
infracost diff --path main.tf
Output:
New instance cost: +$500/month
RDS downsize saves: -$200/month
Net: +$300/month
                

Third-Party Tools

CloudHealth by VMware

Multi-cloud cost management platform

Cost allocation (custom rules)
Chargeback automation
Governance policies (resource limits)
Anomaly detection

Spot.io

Autonomous cost optimization

Automatic RI/SP purchasing (ML-driven)
Workload optimization (right-sizing)
Multi-cloud

Kubecost

Kubernetes cost allocation by namespace/workload

Real-time cost visibility in K8s
Cost per microservice
Idle resource detection

Showback vs Chargeback

Showback

Report costs to teams without billing them

Goal: Visibility and awareness
"Team X's cloud costs are $50K/month"
Advantage: No political friction, teams motivated by metrics

Chargeback

Actual billing to internal teams/business units

Goal: Cost accountability
"Team X's cloud bill is $50K/month (deducted from budget)"
Advantage: True incentive (hits their P&L), fair cost allocation
Disadvantage: Political. Requires strong cost allocation model.

Unit Economics

Critical FinOps Metric

Cost per unit of value delivered. Examples:

Cost per API request
Cost per user
Cost per transaction
Cost per GB processed

Unit Economics = Monthly Cloud Cost / Monthly Units

Example:
Cloud spend: $500K/month
Active users: 1M users
Cost per user: $0.50

If revenue is $2M/month:
Cost as % of revenue: 25%
Target: <20% for healthy business

Cloud Cost Governance

Tagging Strategy & Enforcement

Mandatory Tags (Example)

Environment: prod, staging, dev, sandbox
Team: backend, frontend, data, devops
Project: project name
CostCenter: cost center code
Owner: person responsible
CreatedDate: for automated cleanup

Enforcement

AWS Config: Trigger Lambda if resource missing tag
IAM policy: Deny creation of resources without tag
Regular audits: Monthly report of non-compliant resources
Cleanup: Auto-delete untagged resources older than 30 days

Budget Alerts & Automated Actions

Budget Setup

Team budget: "Backend team max $30K/month"
Service budget: "RDS max $5K/month"
Project budget: "New product pilot max $2K"

Alert Escalation

50% of budget: Informational
80% of budget: Warning to team lead
100% of budget: Alert to finance + engineering leadership

Automated Actions

Stop non-prod resources outside business hours
Auto-shutdown dev environments after 18:00
Auto-delete resources tagged "temporary" after 7 days
Auto-delete untagged resources after 30 days

RI/SP Management

Coverage Targets

Goal: >70% of eligible spend covered by RIs/SPs
Metric: Sum of RI/SP spend / Total eligible spend

Utilization Targets

Goal: >90% utilization
Metric: RI/SP hours used / RI/SP hours purchased
If 70% utilization: You're wasting 30% of commitment money

Expiry Management

Set calendar reminders for expiring RIs (30 days before)
Decide: Renew, let expire, or change instance type
Use RI Analyzer tool to recommend purchases

Cost Allocation & Chargeback

Allocation Rules

100% of cloud costs must be allocated:

Direct: Costs with clear owner (team tag)
Shared infrastructure: Allocate proportional to usage
Example: Kubernetes cluster cost split by namespace CPU percentage

Shared Cost Allocation Methods

Proportional: Allocate by usage metric (% of CPU, GB stored)
Equal split: Divide evenly among users
Fixed split: Agreed-upon percentages

FinOps Governance Council

Monthly meeting

Review cloud spend vs forecast
Investigate anomalies
Plan RI/SP purchases for next quarter
Discuss optimization opportunities
Adjust budgets if business changes

Key FinOps Metrics & KPIs

Metric	Definition	Target / Benchmark	Why It Matters
Cloud spend as % of revenue	Cloud costs / Revenue	10-20% (for SaaS)	Profitability. If >25%, margins compressed.
Unit cost trends	Cost per user / transaction (MoM)	Decreasing year-over-year	Economies of scale. Should go down as company grows.
RI/SP coverage	% of eligible spend with commitment	>70%	Optimization savings. Every 10% improves margins by ~3%
RI/SP utilization	Hours used / Hours purchased	>90%	If 70%, you're wasting money on unused commitments
Waste %	Unused resources / Total spend	<5%	Idle instances, orphaned volumes, unattached IPs
Cost per environment	Prod spend / Non-prod spend ratio	Prod 70%, Non-prod <30%	Over-investment in dev/test wastes money
Forecast accuracy	Actual spend vs Budget variance	Within ±10%	Planning and credibility with finance
Engineer cloud awareness	% of engineers who know their team's spend	>80%	Culture. Engineers make cost-aware architecture decisions

FinOps Best Practices Summary

                Golden Rules of FinOps:
                Make cost visible to everyone (not just finance)
Involve engineering in cost decisions (not just cost-cutting)
Measure and optimize unit economics, not just spend
Use tagging and allocation to drive accountability
Automate everything (tagging, cleanup, scaling, optimization)
Forecast accurately (±10%) to build trust with finance
Celebrate wins and build cost culture
FinOps is a continuous cycle, not a one-time project

            

FinOps / Cloud Finance Cheat Sheet

Table of Contents

FinOps Framework

What is FinOps?

The FinOps Lifecycle (Inform → Optimize → Operate)

Inform Phase

Optimize Phase

Operate Phase

Key Difference

FinOps Personas

FinOps Practitioner (Cloud Finance Engineer)

Engineering Architect/Team Lead

Finance Team

Business/Executive

FinOps Maturity Levels

Crawl (Ad-hoc)

Walk (Standard)

Run (Advanced)

Cloud Pricing Models

On-Demand Pricing

What It Is

Reserved Instances (RIs) & Savings Plans

Reserved Instance (AWS)

Savings Plans (AWS)

Spot Instances (AWS) / Preemptible VMs (GCP)

What It Is

Sustained Use Discounts (GCP)

Automatic Discounts

Committed Use Discounts (GCP CUD)

GCP CUD (1-3 year commitment)

Pricing Comparison Table

AWS Cost Optimization — Detailed

EC2 Optimization

Right-Sizing

Graviton Instances

Spot Instances

Storage Optimization (S3)

Lifecycle Policies

Storage Classes

Data Transfer Costs

Database Optimization (RDS)

Instance Right-Sizing

Reserved Instances for Production

Aurora Serverless v2

Read Replicas

Multi-AZ Warning

Lambda Optimization

Memory = CPU

Provisioned Concurrency

Reserved Concurrency

Cost Anomaly Detection

AWS Cost Anomaly Detection

Budget Alerts

CloudWatch Billing Alarms

Tagging Strategy for Cost Allocation

Mandatory Tags (Enforce via AWS Config)

Tag Compliance

Activate Cost Allocation Tags

GCP Cost Optimization

BigQuery Cost Control

Pay Per Query (Most Common)

Partitioning (Critical!)

Clustering

Slot Reservations

BI Engine Caching

Query Cost Audit

Compute Engine Right-Sizing

Cloud Storage Lifecycle

GCP Storage Classes

Lifecycle Policy

Azure Cost Management

Reserved VM Instances

Azure Hybrid Benefit

Azure Spot VMs

Azure Reservations

Cost Management + Billing

FinOps Tools & Practices

AWS Tools

AWS Cost Explorer

AWS Cost and Usage Report (CUR)