A writeup of the learnings from setting up vanilla GitHub Actions for a large enterprise monorepo.

Example workflow run after a merge, deploying to multiple environments

We found very few examples on how to use GitHub actions in a monorepo. Especially in an enterprise setting, with multiple cloud deployment environments, programming languages and collaborators.

These learnings were obtained by empirical work – migrating a large enterprise monorepo from Jenkins to GitHub Actions.

We wanted a setup that doesn’t rely on third-party actions or hacks. This article outlines one way to achieve that. The uncovered patterns are widely usable. Hopefully.

There’s a high-level Tl;DR.

Background

An enterprise decision had been made to move from Jenkins to GitHub Actions.

The main requirements were:

  • Vanilla GitHub Actions
    • No third-party actions (besides the ones by AWS and Docker)
    • No external build system (no Bazel, Pantsbuild, Turborepo, nx, etc.)
    • No hacks, such as the paths-filter and alls-green actions
  • Monorepo with multiple programming languages (Javascript, Python)
  • Cloud native deployment to AWS using cloudformation or CDK (no Kubernetes)
  • Development environment that shares tool versions with CI
  • Required checks must pass before merging1
  • Minimal changes to codebase

Using vanilla GitHub Actions is a strategy to limit dependencies. It is a practical approach for ensuring maintainability, as well as decreasing exposure to supply-chain related security threats2.

When setting-up or altering a CI/CD pipeline, we found that it is important to consider how it affects the development environment. In a large organization, managing common development tools and their versions is a struggle – especially in the Python ecosystem. To solve this, we use devcontainers, but your team’s needs and requirements may differ.

The rest of the listed requirements are a mix of existing practices within the enterprise.

Solution - High level overview

The codebase is organized into components on the root level. Each component can be deployed individually.

We split the GitHub Actions into a common deployment workflow, two additional supporting workflows, and individual component-specific workflows.

The end result is looks like this:

./
├── .devcontainer/
│   ├── node/
│   │   └── devcontainer.json
│   └── python/
│       └── devcontainer.json
│
├── .github/
│   └── workflows/
│       ├── common-deploy.yaml
│       ├── common-deploy-images.yaml
│       ├── common-deploy-skip.yaml
│       │
│       ├── component-a-trigger.yaml
│       ├── component-b-trigger.yaml
│       ├── component-c-trigger.yaml
│       │
│       ├── images-python-trigger.yaml
│       └── images-node-trigger.yaml
│
├── component-a/
│   ├── scripts/
│   ├── src/
│   └── package.json
│
├── component-b/
│   ├── scripts/
│   ├── src/
│   └── pyproject.toml
│
├── component-c/
│   ├── scripts/
│   ├── src/
│   └── pyproject.toml
│
└── images/
    ├── node.Dockerfile
    └── python.Dockerfile

The design has four main parts:

  1. The component specific workflows, such as commponent-a-trigger.yaml, define the paths and triggers for the workflow. These determine when a workflow should run.
  2. The common workflow, common-deploy.yaml defines setup steps, as well as common steps for all jobs. It initializes environment variables, activates a GitHub environment (more on that below), logs in to AWS and runs the predefined set of shell scripts.
  3. The shell scripts for each job, such as component-a/scripts/lint.sh perform the desired actions. Each component defines its own set of shell scripts. The contents of say, lint.sh, could be npm run lint or pylint depending on the component.
  4. A set of container images for the CI ensure, that most components’ workflows have the same tool versions. They are also used as devcontainers.

Using shell scripts is contrary to what the GitHub Actions documentation suggests. It’s common to use pre-built actions and inline bash in the YAML files. We found that using shell scripts is crucial for keeping the steps locally runnable.

The high-level relationships between the parts in this setup are seen in the following diagram:

Component level diagram of GitHub Actions for a monorepo

The implementation of each part is discussed in detail below.

Additional material

To prevent this already lengthy article from becoming unreadable, we won’t discuss the implementation details at length.

The applied patterns are discussed in-depth in the following articles:

Consider jumping to those, if the article feels overwhelming.

Solution - The actual implementation

Component specific trigger workflows

Each component has a workflow file in the workflows directory.

For example, component-a-trigger.yaml acts as the entrypoint for all workflows of component a. We call this a trigger file. It defines what triggers the workflow, using the on: clause and paths: filters3.

# component-a-trigger.yaml
name: component-a
# This workflow is the entry-point for all workflows of component a

on:
  # Run on push to main branch
  push:
    branches:
      - main
    paths:
      - "component-a/**"
      - ".github/workflows/component-a-trigger.yaml"
      - '!**/*.md'
  # Run on pull request
  pull_request:
    paths:
      - "component-a/**"
      - ".github/workflows/component-a-trigger.yaml"
      - '!**/*.md'

concurrency:
  # Make sure every job on main has unique group id (run_id), so cancel-in-progress only affects PR's
  # https://stackoverflow.com/questions/74117321/if-condition-in-concurrency-in-gha
  group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
  cancel-in-progress: true

permissions:
  contents: read # for checkout
  packages: write # for ghcr.io
  id-token: write # for AWS OIDC

jobs:
  development:
    # Only run on pull request
    if: |
      (github.event_name == 'pull_request' )
    uses: ./.github/workflows/common-deploy.yaml
    secrets: inherit
    with:
      ci_path: ./component-a
      ci_environment: development-env
      ci_image: ghcr.io/${{ github.repository }}/node:latest

      run_lint: true
      run_test: true
      run_deploy: true

  staging:
    # Only run on push to main branch
    if: |
      (github.event_name == 'push' && github.ref_name == 'main')
    uses: ./.github/workflows/common-deploy.yaml
    secrets: inherit
    with:
      ci_path: ./component-a
      ci_environment: staging-env
      ci_image: ghcr.io/${{ github.repository }}/node:latest

      run_lint: false
      run_test: true
      run_deploy: true

  production:
    # Only run on push to main branch
    if: |
      (github.event_name == 'push' && github.ref_name == 'main')
    needs: [staging]
    uses: ./.github/workflows/common-deploy.yaml
    secrets: inherit
    with:
      ci_path: ./component-a
      ci_environment: production-env
      ci_image: ghcr.io/${{ github.repository }}/node:latest

      run_lint: false
      run_test: false
      run_deploy: true

This workflow runs only when the workflow, or files in the component-a/ path are changed. Changes to Markdown files, such as readme’s, don’t trigger the workflow.

Notice that the trigger file has three top-level jobs: development:, staging: and production:. These top-level jobs call the reusable4 workflow common-deploy.yaml with a set of parameters. For example, the ci_path parameter specifies the working directory for running the workflow.

Note also the ci_environment parameter. We’ll use that to activate a GitHub environment. More on that next.

Populate variables using GitHub environments

In GitHub, it’s possible to configure environment variables and secrets for a repo. That’s useful for shared values, but we want to populate some environment variables based on the current environment we are deploying to.

For example, in order to interact with the staging environment, the CI should assume an AWS role that is only permitted to interact with staging resources. Therefore, we’ll need the AWS_ROLE to take on different values. We also want to be absolutely sure that we don’t deploy using the wrong AWS_ROLE. To decrease the risk, we’ll use GitHub environments.

We created three GitHub environments in the repo. The environments are configured under repo settings:

GitHub environment configuration for a repository"

We have added two variables for each the environments: AWS_ROLE and AWS_REGION. The vars context exposes the variables for the currently active environment.

The CI jobs in the common workflow, that need access to AWS, assume the correct role for the environment by using OIDC and the configure-aws-credentials action:

- name: Assume AWS role
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: ${{ vars.AWS_ROLE }}
    role-session-name: github-actions-oidc-session
    aws-region: ${{ vars.AWS_REGION }}

The environments are activated by common-deploy.yaml using the environment: clause5.

Unfortunately there are some confusing UI patterns related to GitHub environments, that are worth mentioning:

When an environment is activated, the GitHub user interface shows an animated deployment icon in the workflow summary. GitHub also adds a message to the pull request timeline, stating that the PR is being deployed. This happens irregardless of whether there is any actual deployment ongoing.

We discuss tips on how to get rid of the message in this article.

A shared reusable deployment workflow

The .github/workflows/common-deploy.yaml defines a reusable workflow. It contains a set of common jobs, such as lint:, test: and deploy:. Each top-level job, such as the jobs in component-a-trigger.yaml, call this single reusable workflow.

The result of this is that the common jobs are included under the top level job!

The name of each included job is automatically prefixed with the top-level job name, eg. the lint job, running under the top-level deployment job is named development / lint. This is best described with a picture.

Here’s how the development: top-level job is rendered on GitHub:

Example workflow run in development

The setup for the jobs, as well as which of them to skip, is controlled by the top-level job, by passing in appropriate inputs using the with: clause. For example, to skip linting on staging, we set run_lint: false.

We found that using one shared file, common-deploy.yaml, to define a predefined set of jobs prevents the staging and production workflows from drifting apart.

Shell scripts for each job

We chose to put the “implementation” for each job in a shell script. For example lint.sh is for running linters, and deploy.sh is for deploying.

Each component can define its own of shell scripts, which contain the necessary steps for an action. We found that using shell scripts gives enough flexibility to permit common-deploy.yaml to be shared between components of different programming languages. Using shell scripts also keeps the steps locally runnable.

The shell scripts are run within the component’s directory by setting the working-directory: parameter.

workflow_call:
  inputs:
    ci_path:
      required: true
      type: string

jobs:
  lint:
    steps:
      # ...
      - name: Lint
        run: ./scripts/lint.sh
        working-directory: ${{ inputs.ci_path }}

Specifying the working directory is the killer way to go about in a monorepo.

The full reusable workflow looks like this:

name: common-deploy
# This workflow defines setup steps, as well as a common steps for all deployment jobs

on:
  # Run on workflow call
  workflow_call:
    inputs:
      # CI Context
      ci_path:
        description: 'Working directory without trailing slash, eg. ./my-component'
        required: true
        type: string
      ci_environment:
        description: 'GitHub deployment environment, eg. development-env'
        required: true
        type: string
      ci_image:
        description: 'Container image, eg. node:23'
        required: true
        type: string

      # CI Jobs to run
      run_lint:
        required: true
        type: boolean
      run_test:
        required: true
        type: boolean
      run_deploy:
        required: true
        type: boolean

permissions:
  contents: read # for checkout
  packages: read # for ghcr.io
  id-token: write # for AWS OIDC

env:
  # Environment variables based on inputs
  ENV: ${{ (contains(inputs.ci_environment,'development') && 'dev') || (contains(inputs.ci_environment, 'staging') && 'stag') || (contains(inputs.ci_environment, 'production') && 'prod') }}
  ENVIRONMENT: ${{ (contains(inputs.ci_environment,'development') && 'development') || (contains(inputs.ci_environment, 'staging') && 'staging') || (contains(inputs.ci_environment, 'production') && 'production') }}

  # Environment variables based on GitHub environment
  AWS_ROLE: ${{ vars.AWS_ROLE }}
  AWS_REGION: ${{ vars.AWS_REGION }}

jobs:
  lint:
    if: ${{ inputs.run_lint }}
    runs-on: ubuntu-latest
    environment: ${{ inputs.ci_environment }}
    container:
      image: ${{ inputs.ci_image }}

    steps:
      - uses: actions/checkout@v4
      - name: Assume AWS role
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ env.AWS_ROLE }}
          role-session-name: github-actions-oidc-session
          aws-region: ${{ env.AWS_REGION }}
      - name: Lint
        run: ./scripts/lint.sh ${{ env.ENV }}
        working-directory: ${{ inputs.ci_path }}

  test:
    if: ${{ inputs.run_test }}
    runs-on: ubuntu-latest
    environment: ${{ inputs.ci_environment }}
    container:
      image: ${{ inputs.ci_image }}

    steps:
      - uses: actions/checkout@v4
      - name: Assume AWS role
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ env.AWS_ROLE }}
          role-session-name: github-actions-oidc-session
          aws-region: ${{ env.AWS_REGION }}
      - name: Run unit tests
        run: ./scripts/test.sh ${{ env.ENV }}
        working-directory: ${{ inputs.ci_path }}

  deploy:
    # Only run if previous non-skipped jobs passed
    if: ${{ !failure() && !cancelled() && inputs.run_deploy }}
    needs: [lint, test]
    runs-on: ubuntu-latest
    environment: ${{ inputs.ci_environment }}
    container:
      image: ${{ inputs.ci_image }}

    steps:
      - uses: actions/checkout@v4
      - name: Assume AWS role
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ env.AWS_ROLE }}
          role-session-name: github-actions-oidc-session
          aws-region: ${{ env.AWS_REGION }}
      - name: Run deploy
        run: ./scripts/deploy.sh ${{ env.ENV }}
        working-directory: ${{ inputs.ci_path }}

The GitHub Actions runner-images6 come with a predetermined set of common tools. In many cases it’s fine to rely on the versions provided by default. We, however, want to ensure that the versions are pinned and controlled by us.

For this, we’ll use the container: property to set a base image for the jobs.

A set of container images for development and CI

A common way to install the language runtimes in CI is to use actions, such as setup-node and setup-python. Using setup actions has drawbacks, because it’s difficult to align the development environment with the CI. We’ll discuss these drawbacks in the next section.

Our approach is to use containers instead of setup actions. Using the container: property, it’s possible to set a base image for a job in CI.

There’s two Dockerfiles in the images/ directory. They define the versions and tools for the Node and Python environments respectively.

To build the images, we use another reusable workflow, common-deploy-images.yaml. The most convenient way to make the images availabe to other workflows, is to push them to ghcr.io. Using other external registries is nontrivial, if the repo is private.

The reusable common-deploy-images.yamlworkflow looks like this:

# .github/workflows/common-deploy-images.yaml
name: common-deploy-images
# This workflow defines common steps for all image jobs

on:
  # Run on workflow call
  workflow_call:
    inputs:
      # CI Context
      ci_path:
        description: 'Working directory without trailing slash, eg. ./images'
        required: true
        type: string

      # Image parameters
      image_file:
        description: 'Dockerfile relative to working directory, eg. node.Dockerfile'
        required: true
        type: string
      image_name:
        description: 'Image name'
        required: true
        type: string
      image_tag:
        description: 'Image tag'
        required: true
        type: string

permissions:
  contents: read # for checkout
  packages: write # for ghcr.io

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - run: echo "no-op"

  test:
    runs-on: ubuntu-latest
    steps:
      - run: echo "no-op"

  deploy:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
      - name: Log in to registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }} # automatically generated
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build and push image
        uses: docker/build-push-action@v6
        with:
          context: ./${{ inputs.ci_path }}
          file: ${{ inputs.ci_path }}/${{ inputs.image_file }}
          push: true
          tags: ghcr.io/${{ github.repository }}/${{ inputs.image_name }}:${{ inputs.image_tag }}

We call this workflow in the same way as we call common-deploy.yaml. One trigger file per image. The images are built and pushed to the private GitHub container registry, in the repo’s namespace.

We’re able to build and push the images with different tags when making a PR. For example, the .github/workflows/images-node-trigger.yaml file is defined as:

name: images-node

on:
  # Run on push to main branch
  push:
    branches:
      - main
    paths:
      - ".github/workflows/common-deploy-images.yaml"
      - ".github/workflows/images-node-trigger.yaml"
      - "images/node.Dockerfile"
  # Run on pull request
  pull_request:
    paths:
      - ".github/workflows/common-deploy-images.yaml"
      - ".github/workflows/images-node-trigger.yaml"
      - "images/node.Dockerfile"

concurrency:
  # Make sure every job on main has unique group id (run_id), so cancel-in-progress only affects PR's
  # https://stackoverflow.com/questions/74117321/if-condition-in-concurrency-in-gha
  group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
  cancel-in-progress: true

permissions:
  contents: read # for checkout
  packages: write # for ghcr.io

jobs:
  development:
    # Only run on pull request
    if: |
      (github.event_name == 'pull_request' )
    uses: ./.github/workflows/common-deploy-images.yaml
    secrets: inherit
    with:
      ci_path: ./images
      image_file: node.Dockerfile
      image_name: node
      image_tag: dev

  # the test job is defined separately, since we use the common-deploy workflow ...
  development-test:
    # ... so we use workaround to ensure job is named 'development / test'
    name: 'development'

    # Only run if previous non-skipped jobs passed
    needs: [development]

    uses: ./.github/workflows/common-deploy.yaml
    secrets: inherit
    with:
      # we run the common-deploy workflow for a component to test the image built above
      ci_path: ./component-a
      ci_environment: development-env
      ci_image: ghcr.io/${{ github.repository }}/node:dev

      run_lint: true
      run_test: true
      run_deploy: false


  production:
    # Only run on push to main branch
    if: |
      (github.event_name == 'push' && github.ref_name == 'main')
    uses: ./.github/workflows/common-deploy-images.yaml
    secrets: inherit
    with:
      ci_path: ./images
      image_file: node.Dockerfile
      image_name: node
      image_tag: latest

For pull-requests, we run the development: job, and build images with the :dev tag. When merging (or pushing to main) we use the :latest tag.

A really cool benefit of the modular approach is that we’re able to run the full CI workflow for any other component as a test step after pushing the image.

We chose to use component-a as the guinea pig for testing changes to the Node image. Whenever the node.Dockerfile changes, we’ll run the lint and test jobs for component-a too.

This testing approach is completely optional, however, and your requirements for testing may not be as strict. If the development-test: job looks confusing, you can do well without it too. Just be sure to add a no-op test: job to the common-deploy-images workflow.

A development environment aligned with the CI

We are left with one more thing to consider: How does it all fare in terms of the developer experience?

The Node and Python versions could be managed per component7,8, but in a monorepo it’s common to unify the language versions.

We want to ensure that the versions for Node, Python and provided tools are aligned accross the components. In addition to that, the developer’s local versions should also match those.

How do we go about providing a developer environment with the correct versions?

In the previous section, we mentioned that a common approach is to install the language runtimes in CI using actions such as setup-node and setup-python. One would then point those to common .node-version and .python-version files in order to align the versions betwen CI, and the local development environment. This approach works for most languages, but breaks down at the tool level.

For example, there’s no tool version file for ensuring that all developers are runnig the same version of shellcheck.

Our solution is to not use setup actions, and instead base the development environment on the container images used in the CI. Thus, the development environment and the CI environment are the same, and there can be no discrepancy.

To keep this article at a reasonable length, we won’t expand on the implementaton here, but a detailed writeup of using devcontainers in a monorepo is available in this article.

Summary

We now have a monorepo running GitHub Actions, deployments to multiple environments and a developer environment aligned with all that.

As mentioned at the beginning of the article, this approach has been tested in a real-world case within a large enterprise. Each organization is unique, and your needs may differ, but the general patterns used here should be of use for many kinds of organizations.

One goal and requirement, was to be able to pull-off the setup using vanilla GitHub Actions. Based on our experience, it is doable, and even a large monorepo of tens of components can be made to work with the same approach.

It’s also worth mentining, that the third-party actions we use are from trused parties (AWS and Docker) and that they are purely a convenience. The full setup can also be accomplished without any third-party actions.

We have now pushed vanilla GitHub Actions as far as they go.

FAQ

Q: Do GitHub Actions support a monorepo structure?
A: Yes, but the real question is “how many additional tools do I need to do that?”. The setup outlined in this article provides a complete solution, without build systems and no required third-party actions.

Q: Won’t the number of yaml files grow with the number of components.
A: Yes, but we find that to be the most understandable setup. The number of yaml files matches the number of components. Its’s possible to cut the amount further using a third-party paths-filter though.

Q: How can I make required checks1 work with optional jobs?
A: The current setup is compatible with required checks. Declare development / lint, development / test and development / deploy as required. Note that we have defined no-op jobs in the common-dpeloy-images.yaml and common-skip.yaml files.

Q: What about dependencies between components?
A: If you need to consider deployment order, you probably need a build system. Read up on Bazel, Pantsbuild and the like.

Q: Do you have additional material or examples?
A: There’s a section on additional reading here.

Q: Is this setup compatible with GitHub merge queue?
A: No.