An example CI/CD setup for a monorepo using vanilla GitHub Actions
A writeup of the learnings from setting up vanilla GitHub Actions for a large enterprise monorepo.
We found very few examples on how to use GitHub actions in a monorepo. Especially in an enterprise setting, with multiple cloud deployment environments, programming languages and collaborators.
These learnings were obtained by empirical work – migrating a large enterprise monorepo from Jenkins to GitHub Actions.
We wanted a setup that doesn’t rely on third-party actions or hacks. This article outlines one way to achieve that. The uncovered patterns are widely usable. Hopefully.
There’s a high-level Tl;DR.
Background
An enterprise decision had been made to move from Jenkins to GitHub Actions.
The main requirements were:
- Vanilla GitHub Actions
- No third-party actions (besides the ones by AWS and Docker)
- No external build system (no Bazel, Pantsbuild, Turborepo, nx, etc.)
- No hacks, such as the paths-filter and alls-green actions
- Monorepo with multiple programming languages (Javascript, Python)
- Cloud native deployment to AWS using cloudformation or CDK (no Kubernetes)
- Development environment that shares tool versions with CI
- Required checks must pass before merging1
- Minimal changes to codebase
Using vanilla GitHub Actions is a strategy to limit dependencies. It is a practical approach for ensuring maintainability, as well as decreasing exposure to supply-chain related security threats2.
When setting-up or altering a CI/CD pipeline, we found that it is important to consider how it affects the development environment. In a large organization, managing common development tools and their versions is a struggle – especially in the Python ecosystem. To solve this, we use devcontainers, but your team’s needs and requirements may differ.
The rest of the listed requirements are a mix of existing practices within the enterprise.
Solution - High level overview
The codebase is organized into components on the root level. Each component can be deployed individually.
We split the GitHub Actions into a common deployment workflow, two additional supporting workflows, and individual component-specific workflows.
The end result is looks like this:
./
├── .devcontainer/
│ ├── node/
│ │ └── devcontainer.json
│ └── python/
│ └── devcontainer.json
│
├── .github/
│ └── workflows/
│ ├── common-deploy.yaml
│ ├── common-deploy-images.yaml
│ ├── common-deploy-skip.yaml
│ │
│ ├── component-a-trigger.yaml
│ ├── component-b-trigger.yaml
│ ├── component-c-trigger.yaml
│ │
│ ├── images-python-trigger.yaml
│ └── images-node-trigger.yaml
│
├── component-a/
│ ├── scripts/
│ ├── src/
│ └── package.json
│
├── component-b/
│ ├── scripts/
│ ├── src/
│ └── pyproject.toml
│
├── component-c/
│ ├── scripts/
│ ├── src/
│ └── pyproject.toml
│
└── images/
├── node.Dockerfile
└── python.Dockerfile
The design has four main parts:
- The component specific workflows, such as
commponent-a-trigger.yaml
, define the paths and triggers for the workflow. These determine when a workflow should run. - The common workflow,
common-deploy.yaml
defines setup steps, as well as common steps for all jobs. It initializes environment variables, activates a GitHub environment (more on that below), logs in to AWS and runs the predefined set of shell scripts. - The shell scripts for each job, such as
component-a/scripts/lint.sh
perform the desired actions. Each component defines its own set of shell scripts. The contents of say,lint.sh
, could benpm run lint
orpylint
depending on the component. - A set of container images for the CI ensure, that most components’ workflows have the same tool versions. They are also used as devcontainers.
Using shell scripts is contrary to what the GitHub Actions documentation suggests. It’s common to use pre-built actions and inline bash in the YAML files. We found that using shell scripts is crucial for keeping the steps locally runnable.
The high-level relationships between the parts in this setup are seen in the following diagram:
The implementation of each part is discussed in detail below.
Additional material
To prevent this already lengthy article from becoming unreadable, we won’t discuss the implementation details at length.
The applied patterns are discussed in-depth in the following articles:
- How to structure workflows to deploy to multiple environments
- Remove deployment events from the pull request timeline
- Cancelling in-progress pull request workflows on push
- Docker compose files as Devcontainers
- Using requried checks conditionally (TBA)
Consider jumping to those, if the article feels overwhelming.
Solution - The actual implementation
Component specific trigger workflows
Each component has a workflow file in the workflows directory.
For example, component-a-trigger.yaml
acts as the entrypoint for all workflows of component a.
We call this a trigger file.
It defines what triggers the workflow, using the on:
clause and paths:
filters3.
# component-a-trigger.yaml
name: component-a
# This workflow is the entry-point for all workflows of component a
on:
# Run on push to main branch
push:
branches:
- main
paths:
- "component-a/**"
- ".github/workflows/component-a-trigger.yaml"
- '!**/*.md'
# Run on pull request
pull_request:
paths:
- "component-a/**"
- ".github/workflows/component-a-trigger.yaml"
- '!**/*.md'
concurrency:
# Make sure every job on main has unique group id (run_id), so cancel-in-progress only affects PR's
# https://stackoverflow.com/questions/74117321/if-condition-in-concurrency-in-gha
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true
permissions:
contents: read # for checkout
packages: write # for ghcr.io
id-token: write # for AWS OIDC
jobs:
development:
# Only run on pull request
if: |
(github.event_name == 'pull_request' )
uses: ./.github/workflows/common-deploy.yaml
secrets: inherit
with:
ci_path: ./component-a
ci_environment: development-env
ci_image: ghcr.io/${{ github.repository }}/node:latest
run_lint: true
run_test: true
run_deploy: true
staging:
# Only run on push to main branch
if: |
(github.event_name == 'push' && github.ref_name == 'main')
uses: ./.github/workflows/common-deploy.yaml
secrets: inherit
with:
ci_path: ./component-a
ci_environment: staging-env
ci_image: ghcr.io/${{ github.repository }}/node:latest
run_lint: false
run_test: true
run_deploy: true
production:
# Only run on push to main branch
if: |
(github.event_name == 'push' && github.ref_name == 'main')
needs: [staging]
uses: ./.github/workflows/common-deploy.yaml
secrets: inherit
with:
ci_path: ./component-a
ci_environment: production-env
ci_image: ghcr.io/${{ github.repository }}/node:latest
run_lint: false
run_test: false
run_deploy: true
This workflow runs only when the workflow, or files in the component-a/
path are changed.
Changes to Markdown files, such as readme’s, don’t trigger the workflow.
Notice that the trigger file has three top-level jobs: development:
, staging:
and production:
.
These top-level jobs call the reusable4 workflow common-deploy.yaml
with a set of parameters.
For example, the ci_path
parameter specifies the working directory for running the workflow.
Note also the ci_environment
parameter.
We’ll use that to activate a GitHub environment.
More on that next.
Populate variables using GitHub environments
In GitHub, it’s possible to configure environment variables and secrets for a repo. That’s useful for shared values, but we want to populate some environment variables based on the current environment we are deploying to.
For example, in order to interact with the staging environment, the CI should assume an AWS role that is only permitted to interact with staging resources.
Therefore, we’ll need the AWS_ROLE
to take on different values.
We also want to be absolutely sure that we don’t deploy using the wrong AWS_ROLE
.
To decrease the risk, we’ll use GitHub environments.
We created three GitHub environments in the repo. The environments are configured under repo settings:
We have added two variables for each the environments: AWS_ROLE
and AWS_REGION
.
The vars
context exposes the variables for the currently active environment.
The CI jobs in the common workflow, that need access to AWS, assume the correct role for the environment by using OIDC and the configure-aws-credentials action:
- name: Assume AWS role
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.AWS_ROLE }}
role-session-name: github-actions-oidc-session
aws-region: ${{ vars.AWS_REGION }}
The environments are activated by common-deploy.yaml
using the environment:
clause5.
Unfortunately there are some confusing UI patterns related to GitHub environments, that are worth mentioning:
When an environment is activated, the GitHub user interface shows an animated deployment icon in the workflow summary. GitHub also adds a message to the pull request timeline, stating that the PR is being deployed. This happens irregardless of whether there is any actual deployment ongoing.
We discuss tips on how to get rid of the message in this article.
A shared reusable deployment workflow
The .github/workflows/common-deploy.yaml
defines a reusable workflow.
It contains a set of common jobs, such as lint:
, test:
and deploy:
.
Each top-level job, such as the jobs in component-a-trigger.yaml
, call this single reusable workflow.
The result of this is that the common jobs are included under the top level job!
The name of each included job is automatically prefixed with the top-level job name, eg. the lint job, running under the top-level deployment job is named development / lint
.
This is best described with a picture.
Here’s how the development:
top-level job is rendered on GitHub:
The setup for the jobs, as well as which of them to skip, is controlled by the top-level job, by passing in appropriate inputs using the with:
clause.
For example, to skip linting on staging, we set run_lint: false
.
We found that using one shared file, common-deploy.yaml
, to define a predefined set of jobs prevents the staging and production workflows from drifting apart.
Shell scripts for each job
We chose to put the “implementation” for each job in a shell script.
For example lint.sh
is for running linters, and deploy.sh
is for deploying.
Each component can define its own of shell scripts, which contain the necessary steps for an action.
We found that using shell scripts gives enough flexibility to permit common-deploy.yaml
to be shared between components of different programming languages.
Using shell scripts also keeps the steps locally runnable.
The shell scripts are run within the component’s directory by setting the working-directory:
parameter.
workflow_call:
inputs:
ci_path:
required: true
type: string
jobs:
lint:
steps:
# ...
- name: Lint
run: ./scripts/lint.sh
working-directory: ${{ inputs.ci_path }}
Specifying the working directory is the killer way to go about in a monorepo.
The full reusable workflow looks like this:
name: common-deploy
# This workflow defines setup steps, as well as a common steps for all deployment jobs
on:
# Run on workflow call
workflow_call:
inputs:
# CI Context
ci_path:
description: 'Working directory without trailing slash, eg. ./my-component'
required: true
type: string
ci_environment:
description: 'GitHub deployment environment, eg. development-env'
required: true
type: string
ci_image:
description: 'Container image, eg. node:23'
required: true
type: string
# CI Jobs to run
run_lint:
required: true
type: boolean
run_test:
required: true
type: boolean
run_deploy:
required: true
type: boolean
permissions:
contents: read # for checkout
packages: read # for ghcr.io
id-token: write # for AWS OIDC
env:
# Environment variables based on inputs
ENV: ${{ (contains(inputs.ci_environment,'development') && 'dev') || (contains(inputs.ci_environment, 'staging') && 'stag') || (contains(inputs.ci_environment, 'production') && 'prod') }}
ENVIRONMENT: ${{ (contains(inputs.ci_environment,'development') && 'development') || (contains(inputs.ci_environment, 'staging') && 'staging') || (contains(inputs.ci_environment, 'production') && 'production') }}
# Environment variables based on GitHub environment
AWS_ROLE: ${{ vars.AWS_ROLE }}
AWS_REGION: ${{ vars.AWS_REGION }}
jobs:
lint:
if: ${{ inputs.run_lint }}
runs-on: ubuntu-latest
environment: ${{ inputs.ci_environment }}
container:
image: ${{ inputs.ci_image }}
steps:
- uses: actions/checkout@v4
- name: Assume AWS role
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ env.AWS_ROLE }}
role-session-name: github-actions-oidc-session
aws-region: ${{ env.AWS_REGION }}
- name: Lint
run: ./scripts/lint.sh ${{ env.ENV }}
working-directory: ${{ inputs.ci_path }}
test:
if: ${{ inputs.run_test }}
runs-on: ubuntu-latest
environment: ${{ inputs.ci_environment }}
container:
image: ${{ inputs.ci_image }}
steps:
- uses: actions/checkout@v4
- name: Assume AWS role
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ env.AWS_ROLE }}
role-session-name: github-actions-oidc-session
aws-region: ${{ env.AWS_REGION }}
- name: Run unit tests
run: ./scripts/test.sh ${{ env.ENV }}
working-directory: ${{ inputs.ci_path }}
deploy:
# Only run if previous non-skipped jobs passed
if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
needs: [lint, test]
runs-on: ubuntu-latest
environment: ${{ inputs.ci_environment }}
container:
image: ${{ inputs.ci_image }}
steps:
- uses: actions/checkout@v4
- name: Assume AWS role
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ env.AWS_ROLE }}
role-session-name: github-actions-oidc-session
aws-region: ${{ env.AWS_REGION }}
- name: Run deploy
run: ./scripts/deploy.sh ${{ env.ENV }}
working-directory: ${{ inputs.ci_path }}
The GitHub Actions runner-images6 come with a predetermined set of common tools. In many cases it’s fine to rely on the versions provided by default. We, however, want to ensure that the versions are pinned and controlled by us.
For this, we’ll use the container:
property to set a base image for the jobs.
A set of container images for development and CI
A common way to install the language runtimes in CI is to use actions, such as setup-node and setup-python. Using setup actions has drawbacks, because it’s difficult to align the development environment with the CI. We’ll discuss these drawbacks in the next section.
Our approach is to use containers instead of setup actions.
Using the container:
property, it’s possible to set a base image for a job in CI.
There’s two Dockerfiles in the images/
directory.
They define the versions and tools for the Node and Python environments respectively.
To build the images, we use another reusable workflow, common-deploy-images.yaml
.
The most convenient way to make the images availabe to other workflows, is to push them to ghcr.io.
Using other external registries is nontrivial, if the repo is private.
The reusable common-deploy-images.yaml
workflow looks like this:
# .github/workflows/common-deploy-images.yaml
name: common-deploy-images
# This workflow defines common steps for all image jobs
on:
# Run on workflow call
workflow_call:
inputs:
# CI Context
ci_path:
description: 'Working directory without trailing slash, eg. ./images'
required: true
type: string
# Image parameters
image_file:
description: 'Dockerfile relative to working directory, eg. node.Dockerfile'
required: true
type: string
image_name:
description: 'Image name'
required: true
type: string
image_tag:
description: 'Image tag'
required: true
type: string
permissions:
contents: read # for checkout
packages: write # for ghcr.io
jobs:
lint:
runs-on: ubuntu-latest
steps:
- run: echo "no-op"
test:
runs-on: ubuntu-latest
steps:
- run: echo "no-op"
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Log in to registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }} # automatically generated
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push image
uses: docker/build-push-action@v6
with:
context: ./${{ inputs.ci_path }}
file: ${{ inputs.ci_path }}/${{ inputs.image_file }}
push: true
tags: ghcr.io/${{ github.repository }}/${{ inputs.image_name }}:${{ inputs.image_tag }}
We call this workflow in the same way as we call common-deploy.yaml
.
One trigger file per image.
The images are built and pushed to the private GitHub container registry, in the repo’s namespace.
We’re able to build and push the images with different tags when making a PR.
For example, the .github/workflows/images-node-trigger.yaml
file is defined as:
name: images-node
on:
# Run on push to main branch
push:
branches:
- main
paths:
- ".github/workflows/common-deploy-images.yaml"
- ".github/workflows/images-node-trigger.yaml"
- "images/node.Dockerfile"
# Run on pull request
pull_request:
paths:
- ".github/workflows/common-deploy-images.yaml"
- ".github/workflows/images-node-trigger.yaml"
- "images/node.Dockerfile"
concurrency:
# Make sure every job on main has unique group id (run_id), so cancel-in-progress only affects PR's
# https://stackoverflow.com/questions/74117321/if-condition-in-concurrency-in-gha
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true
permissions:
contents: read # for checkout
packages: write # for ghcr.io
jobs:
development:
# Only run on pull request
if: |
(github.event_name == 'pull_request' )
uses: ./.github/workflows/common-deploy-images.yaml
secrets: inherit
with:
ci_path: ./images
image_file: node.Dockerfile
image_name: node
image_tag: dev
# the test job is defined separately, since we use the common-deploy workflow ...
development-test:
# ... so we use workaround to ensure job is named 'development / test'
name: 'development'
# Only run if previous non-skipped jobs passed
needs: [development]
uses: ./.github/workflows/common-deploy.yaml
secrets: inherit
with:
# we run the common-deploy workflow for a component to test the image built above
ci_path: ./component-a
ci_environment: development-env
ci_image: ghcr.io/${{ github.repository }}/node:dev
run_lint: true
run_test: true
run_deploy: false
production:
# Only run on push to main branch
if: |
(github.event_name == 'push' && github.ref_name == 'main')
uses: ./.github/workflows/common-deploy-images.yaml
secrets: inherit
with:
ci_path: ./images
image_file: node.Dockerfile
image_name: node
image_tag: latest
For pull-requests, we run the development:
job, and build images with the :dev
tag.
When merging (or pushing to main) we use the :latest
tag.
A really cool benefit of the modular approach is that we’re able to run the full CI workflow for any other component as a test step after pushing the image.
We chose to use component-a
as the guinea pig for testing changes to the Node image.
Whenever the node.Dockerfile
changes, we’ll run the lint
and test
jobs for component-a
too.
This testing approach is completely optional, however, and your requirements for testing may not be as strict.
If the development-test:
job looks confusing, you can do well without it too.
Just be sure to add a no-op test:
job to the common-deploy-images
workflow.
A development environment aligned with the CI
We are left with one more thing to consider: How does it all fare in terms of the developer experience?
The Node and Python versions could be managed per component7,8, but in a monorepo it’s common to unify the language versions.
We want to ensure that the versions for Node, Python and provided tools are aligned accross the components. In addition to that, the developer’s local versions should also match those.
How do we go about providing a developer environment with the correct versions?
In the previous section, we mentioned that a common approach is to install the language runtimes in CI using actions such as setup-node and setup-python.
One would then point those to common .node-version
and .python-version
files in order to align the versions betwen CI, and the local development environment.
This approach works for most languages, but breaks down at the tool level.
For example, there’s no tool version file for ensuring that all developers are runnig the same version of shellcheck.
Our solution is to not use setup actions, and instead base the development environment on the container images used in the CI. Thus, the development environment and the CI environment are the same, and there can be no discrepancy.
To keep this article at a reasonable length, we won’t expand on the implementaton here, but a detailed writeup of using devcontainers in a monorepo is available in this article.
Summary
We now have a monorepo running GitHub Actions, deployments to multiple environments and a developer environment aligned with all that.
As mentioned at the beginning of the article, this approach has been tested in a real-world case within a large enterprise. Each organization is unique, and your needs may differ, but the general patterns used here should be of use for many kinds of organizations.
One goal and requirement, was to be able to pull-off the setup using vanilla GitHub Actions. Based on our experience, it is doable, and even a large monorepo of tens of components can be made to work with the same approach.
It’s also worth mentining, that the third-party actions we use are from trused parties (AWS and Docker) and that they are purely a convenience. The full setup can also be accomplished without any third-party actions.
We have now pushed vanilla GitHub Actions as far as they go.
FAQ
Q: Do GitHub Actions support a monorepo structure?
A: Yes, but the real question is “how many additional tools do I need to do that?”.
The setup outlined in this article provides a complete solution, without build systems and no required third-party actions.
Q: Won’t the number of yaml files grow with the number of components.
A: Yes, but we find that to be the most understandable setup.
The number of yaml files matches the number of components.
Its’s possible to cut the amount further using a third-party paths-filter though.
Q: How can I make required checks1 work with optional jobs?
A: The current setup is compatible with required checks.
Declare development / lint
, development / test
and development / deploy
as required.
Note that we have defined no-op jobs in the common-dpeloy-images.yaml
and common-skip.yaml
files.
Q: What about dependencies between components?
A: If you need to consider deployment order, you probably need a build system. Read up on Bazel, Pantsbuild and the like.
Q: Do you have additional material or examples?
A: There’s a section on additional reading here.
Q: Is this setup compatible with GitHub merge queue?
A: No.