Terraform is the de-facto standard for cloud infrastructure β but teams that don't adopt disciplined practices early pay a steep tax later: state drift, module sprawl, untestable configurations, and the ever-present fear of terraform apply going wrong in production. This guide distils the patterns we apply across multi-cloud engagements at UP2CLOUD.
Module Structure: Think Libraries, Not Scripts
The most common anti-pattern we see is a flat repository: one directory, hundreds of resources, everything tangled together. Treat your Terraform code like a software library with three distinct layers:
- Resource modules β single-purpose, reusable wrappers around provider resources (e.g.,
modules/gcs-bucket,modules/rds-instance). - Composition modules β assemble resource modules into meaningful infrastructure units (e.g.,
modules/web-servicethat includes an ALB, ASG, and RDS). - Root configurations β environment-specific entry points that call composition modules with environment-specific variables.
# Repository layout
infrastructure/
modules/
gcs-bucket/ # Resource module
rds-instance/ # Resource module
web-service/ # Composition module
environments/
prod/
main.tf # Calls web-service module
variables.tf
terraform.tfvars
staging/
main.tf
variables.tf
Remote State: Never Use Local State in Teams
Local state files and version control do not mix. Use GCS or S3 for remote state with state locking enabled via DynamoDB (AWS) or native GCS locking.
# GCS backend configuration
terraform {
backend "gcs" {
bucket = "mycompany-tf-state-prod"
prefix = "web-service/prod"
}
}
# S3 + DynamoDB backend
terraform {
backend "s3" {
bucket = "mycompany-tf-state"
key = "web-service/prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-locks"
encrypt = true
}
}
Critical rule: one state file per environment per service. Monolithic state files that hold hundreds of resources become blast-radius nightmares β a failed apply can lock your entire infrastructure.
Workspace Strategy: Environments via Variables, Not Workspaces
Terraform workspaces are often misused for environment separation. The problem: workspaces share backend configuration and are easy to confuse. We recommend separate root configuration directories per environment with a shared modules library. Use workspaces only for ephemeral feature environments that mirror a single base configuration.
Policy as Code: Sentinel and OPA
Governance without enforcement is just documentation. Two mature policy-as-code solutions integrate with Terraform:
- HashiCorp Sentinel (Terraform Cloud/Enterprise) β runs policies as a gate between
planandapply. Excellent for organisations already on the HCP stack. - OPA Conftest (open source) β validate Terraform plan JSON against Rego policies in any CI system. Free, flexible, cloud-agnostic.
# OPA policy: deny resources without required tags
package terraform
deny[msg] {
resource := input.resource_changes[_]
resource.change.actions[_] == "create"
not resource.change.after.tags.env
msg := sprintf("Resource %v missing required 'env' tag", [resource.address])
}
CI/CD Integration: GitHub Actions + Terraform Cloud
Every terraform plan should run automatically on pull requests; every terraform apply should be gated on merge to main and require approval. Here's a minimal but production-ready GitHub Actions workflow:
name: Terraform Plan
on:
pull_request:
paths: ['infrastructure/**']
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.8.0"
cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }}
- name: Terraform Init
run: terraform init
working-directory: infrastructure/environments/prod
- name: Terraform Validate
run: terraform validate
working-directory: infrastructure/environments/prod
- name: OPA Policy Check
run: |
terraform show -json tfplan.binary | conftest test -
working-directory: infrastructure/environments/prod
- name: Post Plan to PR
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({...})
Avoiding State Drift
State drift occurs when real infrastructure diverges from Terraform state β usually because someone made a console change. Three defences:
- Drift detection runs β schedule
terraform plannightly and alert on any diff. Terraform Cloud has this built in; for open-source stacks, use a cron-triggered GitHub Actions workflow. - Import-first policy β before touching any resource manually, import it into state first. Document this in your runbooks.
- Readonly console policies β restrict IAM/GCP roles so engineers can't modify resources through the console in production environments.
Practical Tips Across Multi-Cloud Setups
When managing GCP and AWS from the same repository, use provider aliasing and clearly namespaced modules. Never mix provider-specific resource modules in the same directory. Keep a versions.tf that pins both the Terraform version and all provider versions β provider upgrades are the most common source of unexpected plan diffs in multi-cloud codebases.
Terraform at scale rewards investment in module quality and governance tooling. Teams that treat their IaC like a production software codebase β with reviews, tests, policies, and automated checks β spend far less time firefighting drift and far more time shipping value.