Modular GitHub Actions: How to structure workflows to deploy to multiple environments
A better way to organize GitHub Actions workflows for deployment to multiple environments.
We deploy to multiple environments from our CI/CD system using GitHub Actions.
Having multiple deployment environments, such as development
, staging
and production
helps us with development and testing.
When considering how to structure the CI/CD pipeline for multiple environments, it is tempting to create a workflow file per environment; This will quickly lead to inconsistencies between the environments.
Our solution is to separate the concerns of when and what to deploy from the how to deploy.
To give you a better overview of what we’re aiming for, here’s an example merge run for a workflow constructed with the approach presented in this article:
The solution has two parts.
First part - When to run what
The first part of the solution is to split out the when.
For some generic deployable component, say my component, we have a file my-component-trigger.yaml
, which acts as the entrypoint for all workflows of my component.
We call this a trigger file.
It defines what triggers the workflow, using the on:
clause and paths:
filters1.
Nothing new so far.
The trigger file also contains a top-level job for each environment: development:
, staging:
and production:
.
These top-level jobs call a reusable2 workflow common-deploy.yaml
with a set of boolean parameters.
The boolean parameters determine what to run.
For example, when running the workflow for the development environment, we want to run everything:
jobs:
development:
# ...
uses: ./.github/workflows/common-deploy.yaml
with:
# ...
run_lint: true
run_test: true
run_deploy: true
We found that assigning the boolean parameters in the trigger file, and passing those to an inner workflow makes it easier to reason about when a particular job should run. This is especially true, if the boolean parameters are set dynamically.
The complete my-component-trigger.yaml
workflow file looks like this:
# my-component-trigger.yaml
name: my-component
on:
# Run on push to main branch
push:
branches:
- main
# Run on pull request
pull_request:
jobs:
development:
# Only run on pull request
if: github.event_name == 'pull_request'
uses: ./.github/workflows/common-deploy.yaml
secrets: inherit
with:
ci_environment: development-env
run_lint: true
run_test: true
run_deploy: true
staging:
# Only run on push to main branch
if: github.event_name == 'push' && github.ref_name == 'main'
uses: ./.github/workflows/common-deploy.yaml
secrets: inherit
with:
ci_environment: staging-env
run_lint: false
run_test: true
run_deploy: true
production:
# Only run on push to main branch
if: github.event_name == 'push' && github.ref_name == 'main'
# ...after staging jobs have completed
needs: [staging]
uses: ./.github/workflows/common-deploy.yaml
secrets: inherit
with:
ci_environment: production-env
run_lint: false
run_test: false
run_deploy: true
With the when and what out of the way, we can move to the second part.
Second part - How to Run it
The second part of the solution is the how.
The file .github/workflows/common-deploy.yaml
defines a reusable workflow.
It contains a set of common jobs, such as lint:
, build:
, test:
and deploy:
.
Each of the top-level jobs call this single reusable workflow.
The common jobs are conditional on the boolean inputs to the workflow.
jobs:
lint:
if: ${{ inputs.run_lint }}
# ...
test:
if: ${{ inputs.run_test }}
# ...
deploy:
# Only run if previous non-skipped jobs passed
if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
needs: [lint, test]
# ...
In this example, we are using Node and AWS for the deployment, but the same structure applies irregardless of what techologies you are using.
The complete common-deploy.yaml
workflow file should look like this:
# common-deploy.yaml
name: common-deploy
on:
# Run on workflow call
workflow_call:
inputs:
# Which environment to activate
ci_environment:
description: 'GitHub deployment environment, eg. development-env'
required: true
type: string
# Which jobs to run
run_lint:
required: false
default: true
type: boolean
run_test:
required: false
default: true
type: boolean
run_deploy:
required: false
default: true
type: boolean
jobs:
lint:
if: ${{ inputs.run_lint }}
runs-on: ubuntu-latest
environment: ${{ inputs.ci_environment }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npm run lint
test:
if: ${{ inputs.run_test }}
runs-on: ubuntu-latest
environment: ${{ inputs.ci_environment }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npm run test
deploy:
# Only run if previous non-skipped jobs passed
if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
needs: [lint, test]
runs-on: ubuntu-latest
environment: ${{ inputs.ci_environment }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.AWS_ROLE }}
role-session-name: github-actions-oidc-session
aws-region: ${{ vars.AWS_REGION }}
- run: npm ci
- run: npm install -g aws-cdk
- run: cdk synth
- run: cdk deploy --all --require-approval never
Note that each job is using a GitHub environment via the environment:
clause3.
Which environment to activate is determined by the ci_environment
input value.
GitHub environments work like containers for variables and secrets.
When an environment is activated, its variables are accessible via the vars
context4.
This way we can provide different values for the AWS_ROLE
and AWS_REGION
for the deployment step.
Using environments makes us more confident in that the workflow is using the correct credentials.
In our case, we created three GitHub environments. The environments are configured under repo settings:
We have added two variables for each of the environments: AWS_ROLE
and AWS_REGION
.
That’s it.
The workflow should now work both for pull requests and merges. Here’s what a workflow run after pull request creation looks like:
Notice how the job names from the common workflow are automatically prefixed with the top-level job name.
We get jobs with the names development / lint
, development / test
and so on.
This is convenient, since the job names are now somewhat standardized. It makes setting up required checks 5 a trivial task.
Pitfalls
When chaining the jobs in the common workflow, you should be aware of how the status check functions work6. We ended up doing a fair amount of empirical work to discover how they handle canceling and skipping.
The success()
function, which is applied by default to all if
clauses, returns false if a needed job is skipped7.
Since we want to conditionally skip jobs (eg. using run_test: false
) and keep going, we can’t use success()
.
Furthermore, since we also want to be able to manually cancel jobs in order to stop a workflow, using always()
is also out of the question.
To work around this, we have the following condition for the deploy:
job:
deploy:
# Only run if previous non-skipped jobs passed
if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
needs: [lint, test]
This makes deploy run when run_deploy
is true, and previous non-skipped jobs passed.