FinOps / Cloud Finance Cheat Sheet

Cloud Cost Mastery & Financial Operations

$90K–$160K US 30+ Q&As Cloud Cost Expert Interview Ready

Table of Contents

FinOps Framework

What is FinOps?

FinOps (Financial Operations) is an operating model for cloud financial management. It's not just about cutting costs—it's about maximizing value from cloud investment through collaboration between finance, engineering, and business teams.

Core Principle: Cost is a first-class engineering metric, just like performance, availability, and security. Engineers should be aware of the cost impact of architectural decisions.

The FinOps Lifecycle (Inform → Optimize → Operate)

Inform Phase

Build cost visibility and understanding

  • Cost allocation (which team/product owns which costs)
  • Benchmarking (how do we compare to peers?)
  • Forecasting (what will we spend next quarter?)
  • Chargeback reporting (showback/chargeback models)

Optimize Phase

Identify and implement savings

  • Rate optimization (RIs, Savings Plans, Spot)
  • Usage optimization (right-sizing, auto-scaling)
  • Commitment-based discounts (planning ahead)
  • Waste elimination (idle resources, orphaned volumes)

Operate Phase

Sustain and continuously improve

  • Cost anomaly detection (alert on unusual spending)
  • Continuous optimization (rolling process)
  • Team engagement (teach engineers to optimize)
  • Policy governance (tagging, resource limits)

Key Difference

FinOps is a CYCLE, not a one-time project. You continuously loop: inform → optimize → operate → inform (with new cost data) → optimize (more) → operate...

FinOps Personas

FinOps Practitioner (Cloud Finance Engineer)

Coordinator between teams. Owns cost strategy, forecasting, reporting.

  • Background: Finance or engineering
  • Tools: Cost Explorer, CloudHealth, SQL
  • Goal: Visibility, forecasting, cost culture

Engineering Architect/Team Lead

Designs infrastructure. Makes decisions that impact cost.

  • Goal: Right-sizing, efficient design, Spot utilization
  • Motivation: Cost awareness, performance, reliability
  • Challenge: Cost trade-offs (is it worth 2x cost for 10% more reliability?)

Finance Team

Accounting, procurement, budgeting. Interfaces with FinOps practitioner.

  • Goal: Accurate forecasting, budget management, cost control
  • Challenge: Cloud variable cost model is new (not like on-prem fixed costs)

Business/Executive

Product owners, CTOs. Strategic decision-makers.

  • Goal: Cost per unit (cost per user, revenue per cost dollar)
  • Decision: Is this cloud? Should we shift on-prem to cloud?

FinOps Maturity Levels

Crawl (Ad-hoc)

Early stage. Reactive cost management.

  • Limited visibility. Cost surprises happen.
  • Basic tagging (if any)
  • No forecasting or budgeting
  • One person owns "cloud costs"

Walk (Standard)

Growing cloud usage. Proactive management starting.

  • Good cost visibility via tagging
  • Allocated costs to teams/products
  • Basic unit economics (cost per user)
  • Some RI/SP utilization
  • Monthly cost reviews

Run (Advanced)

Mature FinOps. Fully optimized, integrated into culture.

  • Real-time cost visibility
  • Fully allocated costs (100% of cloud bill assigned)
  • Accurate forecasting (within ±5%)
  • Automated optimization (auto-scaling, Spot management)
  • Cost embedded in architecture decisions
  • >90% RI/SP utilization
  • <5% waste

Cloud Pricing Models

On-Demand Pricing

What It Is

Pay per hour (or second in modern clouds). No commitment. Maximum flexibility.

Cost: Highest. Baseline pricing.

Use for: Dev/test, unpredictable workloads, short-lived jobs, spiky traffic

Example: AWS t3.medium instance = ~$0.04/hour

Reserved Instances (RIs) & Savings Plans

Reserved Instance (AWS)

Commit to 1-3 year term for specific instance type/region

30-75% off
  • Standard RI: Fixed instance type. 75% savings but inflexible.
  • Convertible RI: Can exchange for different family. 66% savings.
  • Scheduled RI: For predictable times only (e.g., business hours)

Payment options:

  • All Upfront (max savings)
  • Partial Upfront (middle)
  • No Upfront (minimum savings)

Savings Plans (AWS)

Commit to $/hour spend (not instance type). More flexible than RIs.

66-72% off
  • Compute Savings Plan: 66% off. Any compute (EC2, Fargate, Lambda)
  • EC2 Savings Plan: 72% off. Any EC2 in family+region
  • SageMaker Plan: For ML workloads

Advantage: Flexibility. Switch instance types without penalty.

Spot Instances (AWS) / Preemptible VMs (GCP)

What It Is

Use spare cloud capacity at huge discount. Trade reliability for savings.

Cost: 50-90% off on-demand. Price varies by demand (auctions).

Catch: Can be interrupted with 2-minute (AWS) or 30-second (GCP) notice

Use for: Fault-tolerant batch jobs, CI/CD, Spark/Hadoop, stateless services, data processing

NOT for: Stateful services, databases, critical production services requiring 99.9% uptime

Best practice: Mix 70% Spot + 30% on-demand for resilience. Target platform (Spot interruption is random across zones).

Sustained Use Discounts (GCP)

Automatic Discounts

GCP gives automatic discounts (no commitment) when using instances >25% of month

  • 25-50% usage: 20% discount
  • 50-75% usage: 30% discount
  • 75%+ usage: 37% discount

Advantage: No commitment risk. Discount applies automatically.

Committed Use Discounts (GCP CUD)

GCP CUD (1-3 year commitment)

Similar to AWS RIs but more flexible

  • 1-year: 25-37% off
  • 3-year: 52-70% off
  • Can use across instance families/regions (more flexible than AWS RIs)

Pricing Comparison Table

Model Discount Flexibility Use Case
On-Demand 0% (baseline) Maximum Dev/test, unpredictable
Reserved Instance 30-75% Low (specific type) Stable production workloads
Savings Plans 66-72% High (across types) Predictable, diverse compute
Spot/Preemptible 50-90% Very Low (can interrupt) Fault-tolerant batch jobs
Sustained Use 20-37% Maximum (no commitment) Stable workloads (GCP)

AWS Cost Optimization — Detailed

EC2 Optimization

Right-Sizing

Use CloudWatch metrics to identify oversized instances

  • Look at CPU utilization: If consistently <20%, downsize
  • Look at memory utilization: If consistently <30%, downsize
  • AWS Compute Optimizer provides recommendations automatically

Savings potential: 20-40% per instance

Graviton Instances

AWS-designed ARM-based processors. 20% cheaper, 40% better performance than x86

  • Works with: EC2, RDS, ElastiCache, etc.
  • Limitation: Need to rebuild container images for ARM

Spot Instances

For fault-tolerant workloads

  • Use Spot Fleet to manage mix of instance types
  • Capacity-optimized allocation (spreads across pools, reduces interruptions)
  • Use on/off schedule for dev/test (shutdown after hours)

Storage Optimization (S3)

Lifecycle Policies

Automatically move objects to cheaper tiers based on age

-- S3 Lifecycle Example Age: 0-30 days → Standard ($0.023/GB) Age: 30-90 days → Infrequent Access ($0.0125/GB) [50% savings] Age: 90+ days → Glacier ($0.004/GB) [80% savings] Age: 365+ days → Delete -- Result: 30-day avg cost = $0.01/GB (vs $0.023 if all Standard)

Storage Classes

  • S3 Standard: Hot data, frequent access
  • S3 Intelligent-Tiering: Unknown access patterns. Moves data automatically.
  • S3 Infrequent Access: Accessed < monthly. Retrieval fees apply.
  • S3 Glacier: Archive. Expensive to retrieve. Use for compliance backups.
  • S3 Glacier Deep Archive: $1/TB/month storage. $20+ retrieval fee.

Data Transfer Costs

Often overlooked! Egress is expensive (2-9¢/GB depending on destination)

  • Use CloudFront CDN to reduce egress (caches at edge)
  • Keep data in same region when possible
  • Use VPC endpoints to avoid NAT Gateway charges ($0.045/hour + $0.045/GB)

Database Optimization (RDS)

Instance Right-Sizing

  • Use CloudWatch CPU/memory metrics
  • Downsize from db.r6i.4xlarge to db.r6i.2xlarge can save 50%

Reserved Instances for Production

RDS 1-3 year RIs give 40-50% savings for stable workloads

Aurora Serverless v2

Pay per minute of DB compute used. No idle charges. Perfect for variable workloads.

  • Scales 0.5 ACU - 128 ACU automatically
  • Savings: 70% vs fixed instance if workload is spiky

Read Replicas

Reduce load on primary. Cross-region replicas for disaster recovery.

Multi-AZ Warning

Doubles cost for synchronous replica. Only use for production critical DBs.

Lambda Optimization

Memory = CPU

Lambda pricing: $0.0000166667 per GB-second

More memory = faster execution = lower cost (sometimes)

  • 512 MB × 10 seconds = 5120 GB-sec × $0.0000166667 = $0.085
  • 1024 MB × 5 seconds = 5120 GB-sec × $0.0000166667 = $0.085 (same cost, 2x faster!)

Strategy: Use AWS Lambda Power Tuning tool to find optimal memory for each function

Provisioned Concurrency

Keep warm instances running. Cost: $0.015/hour per concurrent execution

Use ONLY for latency-sensitive functions. Otherwise, cold starts are acceptable.

Reserved Concurrency

Limits max concurrency to prevent runaway costs from bugs

Cost Anomaly Detection

AWS Cost Anomaly Detection

ML-based monitoring. Alerts when spending is unusual.

  • Learns baseline spending
  • Detects anomalies (e.g., 20% jump in EC2)
  • Set frequency: daily/weekly alerts

Budget Alerts

Fixed threshold: "Alert if June spend > $50K"

CloudWatch Billing Alarms

Estimated charges. Set alarm at 80% of monthly budget.

Tagging Strategy for Cost Allocation

Mandatory Tags (Enforce via AWS Config)

-- Cost allocation tags Environment: production | staging | dev Team: backend | frontend | data | devops Project: project-name CostCenter: cc-123 Owner: firstname.lastname@company.com Application: service-name Service: compute | storage | database

Tag Compliance

  • Use AWS Config rules to enforce tags on all new resources
  • Provide automated remediation (tag untagged resources)
  • Monthly audit of tag compliance
  • Block untagged resources from deployment (via IAM policy)

Activate Cost Allocation Tags

Tags must be "activated" in Billing console to use for cost allocation

Then use in Cost Explorer, CUR, and forecasting

GCP Cost Optimization

BigQuery Cost Control

Pay Per Query (Most Common)

Pay per byte scanned. $6.25 per TB (first 1 TB/month free)

  • Optimize: Partition & cluster tables to scan fewer bytes
  • Example: Query 100 GB table but only scan 10 GB partition = $0.0625 cost

Partitioning (Critical!)

CREATE TABLE events PARTITION BY DATE(event_timestamp) AS SELECT ...; -- Query only July 1 partition (1GB instead of 30GB) SELECT * FROM events WHERE event_timestamp >= '2024-07-01' AND event_timestamp < '2024-07-02'; -- Cost: $0.006 instead of $0.19

Clustering

Organize data within partitions by column (e.g., user_id)

Further reduces bytes scanned for WHERE clauses on cluster key

Slot Reservations

Flat-rate pricing for predictable workloads

  • Annual: $40K/slot/year (~1 TB query/month included)
  • Flex slots: $4/slot/hour (pay as you go, cheaper than per-query for consistent load)

BI Engine Caching

In-memory cache for dashboard queries

$0.069/GB/month. Often breaks even after 1-2 weeks of repeated dashboards.

Query Cost Audit

SELECT user_email, DATE(creation_time) as query_date, SUM(total_bytes_processed) / POW(10, 12) as tb_scanned, SUM(total_bytes_processed) / POW(10, 12) * 6.25 as estimated_cost_usd FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) GROUP BY user_email, query_date ORDER BY estimated_cost_usd DESC;

Compute Engine Right-Sizing

Cloud Storage Lifecycle

GCP Storage Classes

  • Standard: $0.020/GB. Hot data.
  • Nearline: $0.010/GB. Access < 1x/month. 30-day minimum.
  • Coldline: $0.004/GB. Access < 1x/quarter. 90-day minimum.
  • Archive: $0.0012/GB. Access < 1x/year. 365-day minimum, $0.05 retrieval

Lifecycle Policy

-- Automatic transition 0-90 days: Standard 90-180 days: Nearline 180+ days: Archive

Azure Cost Management

Reserved VM Instances

1-3 year commitment. Up to 72% savings.

  • Payment upfront maximizes savings
  • Can exchange for different size in same family

Azure Hybrid Benefit

Bring existing Windows Server / SQL Server licenses

  • Save up to 40% on compute
  • Must have Software Assurance on licenses

Azure Spot VMs

Up to 90% discount for interruptible workloads

Azure Reservations

For predictable services: Cosmos DB, SQL DB, Blob Storage, App Service

1-3 year commitments with 20-40% savings

Cost Management + Billing

  • Budget alerts
  • Cost analysis by resource group, subscription, tag
  • Advisor recommendations (rightsizing, idle resources)
  • Cost estimation (before deploying)

FinOps Tools & Practices

AWS Tools

AWS Cost Explorer

Dashboard for cost visibility

  • Filter by service, region, tag, linked account
  • View daily/monthly trends
  • Forecasting (ML-based)
  • RI/SP utilization dashboard
  • Right-sizing recommendations

AWS Cost and Usage Report (CUR)

Granular billing data (>90 columns per line item)

Export to S3 → Analyze in Athena or QuickSight

  • Per-resource costs
  • RI/SP amortization
  • Savings Plans utilization
  • Unblended vs amortized cost

Infracost

CLI tool: Estimate Terraform changes

-- Show cost delta before apply infracost diff --path main.tf Output: New instance cost: +$500/month RDS downsize saves: -$200/month Net: +$300/month

Third-Party Tools

CloudHealth by VMware

Multi-cloud cost management platform

  • Cost allocation (custom rules)
  • Chargeback automation
  • Governance policies (resource limits)
  • Anomaly detection

Spot.io

Autonomous cost optimization

  • Automatic RI/SP purchasing (ML-driven)
  • Workload optimization (right-sizing)
  • Multi-cloud

Kubecost

Kubernetes cost allocation by namespace/workload

  • Real-time cost visibility in K8s
  • Cost per microservice
  • Idle resource detection

Showback vs Chargeback

Showback

Report costs to teams without billing them

  • Goal: Visibility and awareness
  • "Team X's cloud costs are $50K/month"
  • Advantage: No political friction, teams motivated by metrics

Chargeback

Actual billing to internal teams/business units

  • Goal: Cost accountability
  • "Team X's cloud bill is $50K/month (deducted from budget)"
  • Advantage: True incentive (hits their P&L), fair cost allocation
  • Disadvantage: Political. Requires strong cost allocation model.

Unit Economics

Critical FinOps Metric

Cost per unit of value delivered. Examples:

  • Cost per API request
  • Cost per user
  • Cost per transaction
  • Cost per GB processed
Unit Economics = Monthly Cloud Cost / Monthly Units Example: Cloud spend: $500K/month Active users: 1M users Cost per user: $0.50 If revenue is $2M/month: Cost as % of revenue: 25% Target: <20% for healthy business

Cloud Cost Governance

Tagging Strategy & Enforcement

Mandatory Tags (Example)

  • Environment: prod, staging, dev, sandbox
  • Team: backend, frontend, data, devops
  • Project: project name
  • CostCenter: cost center code
  • Owner: person responsible
  • CreatedDate: for automated cleanup

Enforcement

  • AWS Config: Trigger Lambda if resource missing tag
  • IAM policy: Deny creation of resources without tag
  • Regular audits: Monthly report of non-compliant resources
  • Cleanup: Auto-delete untagged resources older than 30 days

Budget Alerts & Automated Actions

Budget Setup

  • Team budget: "Backend team max $30K/month"
  • Service budget: "RDS max $5K/month"
  • Project budget: "New product pilot max $2K"

Alert Escalation

  • 50% of budget: Informational
  • 80% of budget: Warning to team lead
  • 100% of budget: Alert to finance + engineering leadership

Automated Actions

  • Stop non-prod resources outside business hours
  • Auto-shutdown dev environments after 18:00
  • Auto-delete resources tagged "temporary" after 7 days
  • Auto-delete untagged resources after 30 days

RI/SP Management

Coverage Targets

  • Goal: >70% of eligible spend covered by RIs/SPs
  • Metric: Sum of RI/SP spend / Total eligible spend

Utilization Targets

  • Goal: >90% utilization
  • Metric: RI/SP hours used / RI/SP hours purchased
  • If 70% utilization: You're wasting 30% of commitment money

Expiry Management

  • Set calendar reminders for expiring RIs (30 days before)
  • Decide: Renew, let expire, or change instance type
  • Use RI Analyzer tool to recommend purchases

Cost Allocation & Chargeback

Allocation Rules

100% of cloud costs must be allocated:

  • Direct: Costs with clear owner (team tag)
  • Shared infrastructure: Allocate proportional to usage
  • Example: Kubernetes cluster cost split by namespace CPU percentage

Shared Cost Allocation Methods

  • Proportional: Allocate by usage metric (% of CPU, GB stored)
  • Equal split: Divide evenly among users
  • Fixed split: Agreed-upon percentages

FinOps Governance Council

Monthly meeting

  • Review cloud spend vs forecast
  • Investigate anomalies
  • Plan RI/SP purchases for next quarter
  • Discuss optimization opportunities
  • Adjust budgets if business changes

Key FinOps Metrics & KPIs

Metric Definition Target / Benchmark Why It Matters
Cloud spend as % of revenue Cloud costs / Revenue 10-20% (for SaaS) Profitability. If >25%, margins compressed.
Unit cost trends Cost per user / transaction (MoM) Decreasing year-over-year Economies of scale. Should go down as company grows.
RI/SP coverage % of eligible spend with commitment >70% Optimization savings. Every 10% improves margins by ~3%
RI/SP utilization Hours used / Hours purchased >90% If 70%, you're wasting money on unused commitments
Waste % Unused resources / Total spend <5% Idle instances, orphaned volumes, unattached IPs
Cost per environment Prod spend / Non-prod spend ratio Prod 70%, Non-prod <30% Over-investment in dev/test wastes money
Forecast accuracy Actual spend vs Budget variance Within ±10% Planning and credibility with finance
Engineer cloud awareness % of engineers who know their team's spend >80% Culture. Engineers make cost-aware architecture decisions

Top 30 Interview Questions & Answers

1. What is the FinOps framework and its three phases? +

FinOps Lifecycle: Inform → Optimize → Operate (continuous cycle)

Inform Phase: Build cost visibility through allocation, benchmarking, forecasting. Make costs visible to teams.

Optimize Phase: Identify and implement savings through rate optimization (RIs, SPs, Spot), usage optimization (right-sizing), and waste elimination.

Operate Phase: Sustain improvements through cost culture, anomaly detection, policy governance, and continuous improvement cycles.

Key insight: It's a cycle, not one-time project. You continuously loop with new data.

2. What is the difference between Reserved Instances and Savings Plans? +

Reserved Instances (RIs):

  • Commit to specific instance type + region + OS
  • 30-75% discount (Standard) to 66% (Convertible)
  • Inflexible. If you need different instance type, can't easily change.
  • Use for: Stable production workloads with predictable instance types

Savings Plans:

  • Commit to $/hour spend, ANY compute service
  • 66-72% discount
  • Flexible. Switch between instance types, regions, or even Lambda without penalty
  • Use for: Diverse compute needs, multiple instance families

Bottom line: Savings Plans > RIs for flexibility. RIs slightly cheaper if you never change instance type.

3. How do you create a tagging strategy for a large organization? +

Step 1: Identify business stakeholders

Finance, engineering, product, security. Each has different needs for tags.

Step 2: Define mandatory tags

  • Environment: prod, staging, dev (for billing by env)
  • Team: backend, frontend, data, infra (cost allocation)
  • Project/Application: project name (cost per product)
  • Cost Center: organizational code (accounting)
  • Owner: person@company.com (responsibility)
  • Created Date: YYYY-MM-DD (auto-cleanup)

Step 3: Enforce via automation

  • AWS Config rules: Detect untagged resources daily
  • IAM policy: Deny ec2:RunInstances without required tags
  • Auto-remediation: Tag untagged resources or notify owner

Step 4: Activate for cost allocation

In AWS Billing console, activate cost allocation tags. Then use in Cost Explorer.

Step 5: Monitor compliance

Monthly report: "92% compliance, 8% untagged resources in dev." Celebrate improvement.

4. How would you reduce AWS costs by 30%? +

Systematic approach (not just cutting):

Phase 1: Visibility (Week 1)

  • Pull CUR data, analyze by service and tag
  • Identify top cost drivers (usually EC2, S3, RDS)
  • Benchmark: Compare to industry peer (SaaS typically 10-20% of revenue)

Phase 2: Quick Wins (Week 2-3) - ~10% savings

  • Delete unattached volumes and orphaned resources
  • Stop non-prod resources outside business hours
  • Rightsize oversized instances (e.g., c5.4xlarge → c5.2xlarge)

Phase 3: Commitment Optimization (Week 4+) - ~15% savings

  • Analyze RI/SP coverage. Current: 30%, Target: >70%
  • Purchase Savings Plans for compute (covers EC2, Fargate, Lambda)
  • Purchase RIs for databases (RDS)

Phase 4: Architecture (Ongoing) - ~5% savings

  • S3 lifecycle policies (move old data to Glacier)
  • Migrate to Graviton instances (20% cheaper, faster)
  • Use Spot instances for non-critical batch jobs
  • Aurora Serverless for variable workloads

Total: ~30% savings across quarters

Key: Involve engineers. Architecture decisions compound over time.

5. What is the difference between showback and chargeback? +

Showback: Report costs to teams without billing them. "Your team spent $50K on cloud this month."

  • Advantage: Visibility, awareness, behavioral change without friction
  • Disadvantage: Weaker incentive (it doesn't hit their budget)

Chargeback: Actually bill teams for their cloud usage. Deduct from their budget.

  • Advantage: True economic incentive, fair cost allocation, accountability
  • Disadvantage: Political friction, requires strong cost model to avoid disputes

Best practice: Start with showback to build awareness. Move to chargeback once teams trust the allocation model.

6-30. Additional Interview Questions (Key Answers) +

6. How do you handle RI expiry management? Set calendar reminders 60 days before expiry. Analyze usage: Is this still needed? Should we renew, let expire, or downsize? Use RI Analyzer tool for recommendations.

7. What metrics would you use to measure FinOps maturity? Cost visibility (% of spend allocated), RI/SP coverage (>70%), RI/SP utilization (>90%), forecast accuracy (±10%), waste (<5%), engineer awareness (>80% know their costs).

8. How do you allocate shared infrastructure costs (Kubernetes, databases) to teams? Proportional allocation: K8s cluster cost split by namespace CPU percentage. Shared DB cost split by GB storage or query count. Use tagging + custom allocation rules in CloudHealth or similar tool.

9. What is unit economics in cloud context? Cost per unit of value. Examples: Cost per user, cost per API call, cost per transaction. Critical for profitability. Should improve (decrease) as company scales due to economies of scale.

10. How do you build a cost culture in an engineering team? (1) Make costs visible (cost per feature, cost per service). (2) Involve engineers in cost decisions (show impact of choosing larger instance). (3) Celebrate wins (saved $100K). (4) Make cost a design constraint ("Design for <$10K/month").

11. What are the most common cloud waste patterns? Idle instances (not running), oversized instances (paying for unused capacity), unattached volumes, unused data transfer, unused RI/SP (sitting on shelf), orphaned snapshots/load balancers.

12. How do you approach cloud cost forecasting? (1) Gather historical 12 months data. (2) Adjust for known changes (new product launch, customer acquisition rate). (3) Use trend analysis or ML. (4) Add contingency (10-15%). (5) Share with teams, get feedback, refine monthly.

13. What is spot instance interruption handling? Spot instances can be interrupted with 2-minute notice. Mitigate: (1) Use multiple availability zones. (2) Use Spot Fleet to manage mix of types/zones. (3) Use capacity-optimized allocation. (4) Mix Spot + on-demand. (5) Use for fault-tolerant workloads only.

14. Explain the AWS Savings Plans vs Reserved Instances trade-offs. RIs: More savings (75%), less flexible. SPs: Less savings (72%), more flexible. Choose RIs if locked to specific instance type for 3 years. Choose SPs if need flexibility (change instance size, region, service).

15. How do you govern cloud costs without slowing down engineering? Automate: Auto-cleanup resources, auto-stop non-prod after hours. Educate: Show cost impact upfront. Enable: Self-service cost visibility. Governance light: Policy-based resource limits, not approval gates.

16. What is multi-cloud cost management? Managing costs across AWS, GCP, Azure. Challenges: Different pricing models, tagging/allocation schemes vary. Tools: CloudHealth, Spot.io, custom integrations. Best practice: Standardize tagging and allocation logic across clouds.

17. How would you optimize Kubernetes costs? (1) Right-size requests/limits. (2) Use Spot pods for non-critical workloads. (3) Use HPA (Horizontal Pod Autoscaler) to scale pods. (4) Use cluster autoscaler to scale nodes. (5) Monitor costs per namespace. (6) Use tools like Kubecost.

18. How do you control BigQuery costs? (1) Partition tables by date (reduce bytes scanned). (2) Cluster tables (further reduce bytes). (3) Use slots for predictable workloads. (4) Audit expensive queries (INFORMATION_SCHEMA). (5) Set query limits in BigQuery console.

19. What is cost anomaly investigation process? (1) Confirm it's real (check data pipeline). (2) Segment: Which service? Which region? Which team? (3) Root cause: Code deploy? Traffic spike? Misconfiguration? (4) Mitigate: Revert code, optimize, or adjust budget.

20. How do you size RI and Savings Plan commitments? Look at 12-month history, account for growth (20%+ YoY typical), look at recent 3-month baseline. Aim for 70-80% coverage (leave 20-30% for flexibility/spikes). Conservative approach: Undershoot coverage, buy more as you understand patterns.

21. How do you automate tagging enforcement? AWS Config rules detect untagged resources daily. Lambda auto-tags them (e.g., with "UNTAGGED" + creation date). Or auto-stop untagged resources after 48 hours. Or IAM policy denies creation without tags. Combination approach most effective.

22. What is cloud waste automation? Automatically identify and fix: Delete unattached volumes older than 7 days, delete orphaned snapshots, remove unused load balancers, stop instances tagged "temporary" after 24 hours, delete untagged resources after 30 days.

23. How do you evaluate FinOps tools? Criteria: (1) Cost visibility (ease of use, detail), (2) Chargeback capabilities (allocation logic), (3) Forecasting accuracy, (4) Integrations (your cloud providers, ITSM tools), (5) Ease of implementation, (6) Support quality, (7) Price (shouldn't be >5% of cloud spend).

24. How do you build engineering cost visibility? (1) Dashboard per team showing YTD spend, forecast, trend. (2) Cost per service/microservice. (3) Top cost drivers. (4) Comparison to budget. (5) Email alerts at 80% of budget. (6) Monthly FinOps sync with engineering leads.

25. What is carbon footprint and sustainability in cloud? Newer concern. Cloud providers publish carbon intensity. Optimizing costs often reduces carbon (fewer resources = lower emissions). Some companies set carbon budgets (kg CO2/month) in addition to dollar budgets.

26. How do you optimize data egress costs? Egress is expensive (2-9¢/GB depending on destination). Mitigate: Use CDN (CloudFront) to reduce egress. Keep data in same region. Use VPC endpoints to avoid NAT Gateway. Monitor egress trends monthly.

27. What are common network cost patterns? NAT Gateway expensive ($32/month + $0.045/GB). Cross-region data transfer expensive. EC2 <-> S3 in different region: expensive. VPC endpoints: $7/month but save on NAT costs. Optimize: Co-locate resources, use endpoints, avoid cross-region.

28. How do you integrate cost into infrastructure as code (IaC)? Use Infracost: Estimates Terraform cost changes in PRs. Developers see cost impact before merge. Or use tagging in Terraform to enable cost allocation. Or add cost approval gate (expensive resources need sign-off).

29. What is cost per microservice? Tag all resources by service (app, database, cache). Allocate shared infrastructure proportional to usage. Monthly report: "Microservice X cost $5K, revenue $20K, ROI 4x." Helps identify underutilized services.

30. How do you stay current with cloud pricing and FinOps practices? Read: FinOps Foundation resources, cloud provider blogs, industry reports. Join: FinOps Foundation, local meetups. Experiment: Sandbox: Try new tools, services, pricing models. Share: Presentations, internal knowledge sharing.

FinOps Best Practices Summary

Golden Rules of FinOps:
  • Make cost visible to everyone (not just finance)
  • Involve engineering in cost decisions (not just cost-cutting)
  • Measure and optimize unit economics, not just spend
  • Use tagging and allocation to drive accountability
  • Automate everything (tagging, cleanup, scaling, optimization)
  • Forecast accurately (±10%) to build trust with finance
  • Celebrate wins and build cost culture
  • FinOps is a continuous cycle, not a one-time project