Terraform at Scale: IaC Best Practices for Multi-Cloud Teams

Terraform is the de-facto standard for cloud infrastructure — but teams that don't adopt disciplined practices early pay a steep tax later: state drift, module sprawl, untestable configurations, and the ever-present fear of terraform apply going wrong in production. This guide distils the patterns we apply across multi-cloud engagements at UP2CLOUD.

Module Structure: Think Libraries, Not Scripts

The most common anti-pattern we see is a flat repository: one directory, hundreds of resources, everything tangled together. Treat your Terraform code like a software library with three distinct layers:

Resource modules — single-purpose, reusable wrappers around provider resources (e.g., modules/gcs-bucket, modules/rds-instance).
Composition modules — assemble resource modules into meaningful infrastructure units (e.g., modules/web-service that includes an ALB, ASG, and RDS).
Root configurations — environment-specific entry points that call composition modules with environment-specific variables.

# Repository layout
infrastructure/
  modules/
    gcs-bucket/          # Resource module
    rds-instance/        # Resource module
    web-service/         # Composition module
  environments/
    prod/
      main.tf            # Calls web-service module
      variables.tf
      terraform.tfvars
    staging/
      main.tf
      variables.tf

Remote State: Never Use Local State in Teams

Local state files and version control do not mix. Use GCS or S3 for remote state with state locking enabled via DynamoDB (AWS) or native GCS locking.

# GCS backend configuration
terraform {
  backend "gcs" {
    bucket  = "mycompany-tf-state-prod"
    prefix  = "web-service/prod"
  }
}

# S3 + DynamoDB backend
terraform {
  backend "s3" {
    bucket         = "mycompany-tf-state"
    key            = "web-service/prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-locks"
    encrypt        = true
  }
}

Critical rule: one state file per environment per service. Monolithic state files that hold hundreds of resources become blast-radius nightmares — a failed apply can lock your entire infrastructure.

Workspace Strategy: Environments via Variables, Not Workspaces

Terraform workspaces are often misused for environment separation. The problem: workspaces share backend configuration and are easy to confuse. We recommend separate root configuration directories per environment with a shared modules library. Use workspaces only for ephemeral feature environments that mirror a single base configuration.

Policy as Code: Sentinel and OPA

Governance without enforcement is just documentation. Two mature policy-as-code solutions integrate with Terraform:

HashiCorp Sentinel (Terraform Cloud/Enterprise) — runs policies as a gate between plan and apply. Excellent for organisations already on the HCP stack.
OPA Conftest (open source) — validate Terraform plan JSON against Rego policies in any CI system. Free, flexible, cloud-agnostic.

# OPA policy: deny resources without required tags
package terraform

deny[msg] {
  resource := input.resource_changes[_]
  resource.change.actions[_] == "create"
  not resource.change.after.tags.env
  msg := sprintf("Resource %v missing required 'env' tag", [resource.address])
}

CI/CD Integration: GitHub Actions + Terraform Cloud

Every terraform plan should run automatically on pull requests; every terraform apply should be gated on merge to main and require approval. Here's a minimal but production-ready GitHub Actions workflow:

name: Terraform Plan
on:
  pull_request:
    paths: ['infrastructure/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.8.0"
          cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }}
      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure/environments/prod
      - name: Terraform Validate
        run: terraform validate
        working-directory: infrastructure/environments/prod
      - name: OPA Policy Check
        run: |
          terraform show -json tfplan.binary | conftest test -
        working-directory: infrastructure/environments/prod
      - name: Post Plan to PR
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({...})

Avoiding State Drift

State drift occurs when real infrastructure diverges from Terraform state — usually because someone made a console change. Three defences:

Drift detection runs — schedule terraform plan nightly and alert on any diff. Terraform Cloud has this built in; for open-source stacks, use a cron-triggered GitHub Actions workflow.
Import-first policy — before touching any resource manually, import it into state first. Document this in your runbooks.
Readonly console policies — restrict IAM/GCP roles so engineers can't modify resources through the console in production environments.

Practical Tips Across Multi-Cloud Setups

When managing GCP and AWS from the same repository, use provider aliasing and clearly namespaced modules. Never mix provider-specific resource modules in the same directory. Keep a versions.tf that pins both the Terraform version and all provider versions — provider upgrades are the most common source of unexpected plan diffs in multi-cloud codebases.

Terraform at scale rewards investment in module quality and governance tooling. Teams that treat their IaC like a production software codebase — with reviews, tests, policies, and automated checks — spend far less time firefighting drift and far more time shipping value.