A better way to organize GitHub Actions workflows for deployment to multiple environments.

We deploy to multiple environments from our CI/CD system using GitHub Actions. Having multiple deployment environments, such as development, staging and production helps us with development and testing.

When considering how to structure the CI/CD pipeline for multiple environments, it is tempting to create a workflow file per environment; This will quickly lead to inconsistencies between the environments.

Our solution is to separate the concerns of when and what to deploy from the how to deploy.

To give you a better overview of what we’re aiming for, here’s an example merge run for a workflow constructed with the approach presented in this article:

Example workflow run after a merge, deploying to multiple environments

The solution has two parts.

First part - When to run what

The first part of the solution is to split out the when.

For some generic deployable component, say my component, we have a file my-component-trigger.yaml, which acts as the entrypoint for all workflows of my component. We call this a trigger file. It defines what triggers the workflow, using the on: clause and paths: filters1.

Nothing new so far.

The trigger file also contains a top-level job for each environment: development:, staging: and production:. These top-level jobs call a reusable2 workflow common-deploy.yaml with a set of boolean parameters. The boolean parameters determine what to run.

For example, when running the workflow for the development environment, we want to run everything:

jobs:
  development:
    # ...
    uses: ./.github/workflows/common-deploy.yaml
    with:
      # ...
      run_lint: true
      run_test: true
      run_deploy: true

We found that assigning the boolean parameters in the trigger file, and passing those to an inner workflow makes it easier to reason about when a particular job should run. This is especially true, if the boolean parameters are set dynamically.

The complete my-component-trigger.yaml workflow file looks like this:

# my-component-trigger.yaml
name: my-component

on:
  # Run on push to main branch
  push:
    branches:
      - main
  # Run on pull request
  pull_request:

jobs:
  development:
    # Only run on pull request
    if: github.event_name == 'pull_request'
    uses: ./.github/workflows/common-deploy.yaml
    secrets: inherit
    with:
      ci_environment: development-env
      run_lint: true
      run_test: true
      run_deploy: true

  staging:
    # Only run on push to main branch
    if: github.event_name == 'push' && github.ref_name == 'main'
    uses: ./.github/workflows/common-deploy.yaml
    secrets: inherit
    with:
      ci_environment: staging-env
      run_lint: false
      run_test: true
      run_deploy: true

  production:
    # Only run on push to main branch
    if: github.event_name == 'push' && github.ref_name == 'main'
    # ...after staging jobs have completed
    needs: [staging]
    uses: ./.github/workflows/common-deploy.yaml
    secrets: inherit
    with:
      ci_environment: production-env
      run_lint: false
      run_test: false
      run_deploy: true

With the when and what out of the way, we can move to the second part.

Second part - How to Run it

The second part of the solution is the how.

The file .github/workflows/common-deploy.yaml defines a reusable workflow. It contains a set of common jobs, such as lint:, build:, test: and deploy:. Each of the top-level jobs call this single reusable workflow.

The common jobs are conditional on the boolean inputs to the workflow.

jobs:
  lint:
    if: ${{ inputs.run_lint }}
    # ...

  test:
    if: ${{ inputs.run_test }}
    # ...
  deploy:
    # Only run if previous non-skipped jobs passed
    if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
    needs: [lint, test]
    # ...

In this example, we are using Node and AWS for the deployment, but the same structure applies irregardless of what techologies you are using.

The complete common-deploy.yaml workflow file should look like this:

# common-deploy.yaml
name: common-deploy

on:
  # Run on workflow call
  workflow_call:
    inputs:
      # Which environment to activate
      ci_environment:
        description: 'GitHub deployment environment, eg. development-env'
        required: true
        type: string

      # Which jobs to run
      run_lint:
        required: false
        default: true
        type: boolean
      run_test:
        required: false
        default: true
        type: boolean
      run_deploy:
        required: false
        default: true
        type: boolean

jobs:
  lint:
    if: ${{ inputs.run_lint }}
    runs-on: ubuntu-latest
    environment: ${{ inputs.ci_environment }}

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npm run lint

  test:
    if: ${{ inputs.run_test }}
    runs-on: ubuntu-latest
    environment: ${{ inputs.ci_environment }}

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npm run test

  deploy:
    # Only run if previous non-skipped jobs passed
    if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
    needs: [lint, test]
    runs-on: ubuntu-latest
    environment: ${{ inputs.ci_environment }}

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ vars.AWS_ROLE }}
          role-session-name: github-actions-oidc-session
          aws-region: ${{ vars.AWS_REGION }}
      - run: npm ci
      - run: npm install -g aws-cdk
      - run: cdk synth
      - run: cdk deploy --all --require-approval never

Note that each job is using a GitHub environment via the environment: clause3. Which environment to activate is determined by the ci_environment input value.

GitHub environments work like containers for variables and secrets. When an environment is activated, its variables are accessible via the vars context4.

This way we can provide different values for the AWS_ROLE and AWS_REGION for the deployment step. Using environments makes us more confident in that the workflow is using the correct credentials.

In our case, we created three GitHub environments. The environments are configured under repo settings:

GitHub environment configuration for a repository"

We have added two variables for each of the environments: AWS_ROLE and AWS_REGION.

That’s it.

The workflow should now work both for pull requests and merges. Here’s what a workflow run after pull request creation looks like:

Example workflow run after pull request creation, deploying to multiple environments

Notice how the job names from the common workflow are automatically prefixed with the top-level job name. We get jobs with the names development / lint, development / test and so on.

This is convenient, since the job names are now somewhat standardized. It makes setting up required checks 5 a trivial task.

Pitfalls

When chaining the jobs in the common workflow, you should be aware of how the status check functions work6. We ended up doing a fair amount of empirical work to discover how they handle canceling and skipping.

The success() function, which is applied by default to all if clauses, returns false if a needed job is skipped7.

Since we want to conditionally skip jobs (eg. using run_test: false) and keep going, we can’t use success(). Furthermore, since we also want to be able to manually cancel jobs in order to stop a workflow, using always() is also out of the question.

To work around this, we have the following condition for the deploy: job:


  deploy:
    # Only run if previous non-skipped jobs passed
    if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
    needs: [lint, test]

This makes deploy run when run_deploy is true, and previous non-skipped jobs passed.