Documentation
Deployment
Kestra

Orchestrating CloudQuery Syncs with Kestra

In this tutorial, we will show you how to run CloudQuery as a Kestra (opens in a new tab) flow, using the AWS source- and Postgresql destination plugins as an example. Kestra is an open source orchestration tool that allows you to schedule and monitor CloudQuery syncs.

Step 1: Install Kestra

Follow the Kestra Deployment with Docker guide (opens in a new tab) to run Kestra locally inside Docker containers.

When it's running, open http://localhost:8080 (opens in a new tab) in your browser.

Step 2: Set up a PostgreSQL database

We will use a PostgreSQL database as a destination for our CloudQuery syncs. You can use any PostgreSQL database, but for this tutorial we will use a Docker container.

docker run --name cloudquery-postgres -e POSTGRES_PASSWORD=pass -p 5432:5432 -d postgres

Step 2: Create a Kestra flow

Inside the Kestra UI, go to the Flows tab and click on Create. You can now create a new flow with the following content:

id: "cloudquery"
namespace: "io.kestra"
tasks:
- id: "bash"
  type: "io.kestra.core.tasks.scripts.Bash"
  runner: DOCKER
  inputFiles:
    config.yml: |
      kind: source
      spec:
        name: aws
        path: cloudquery/aws
        version: "v22.15.2"
        tables: ["aws_ec2*"]
        destinations: ["postgresql"]
        spec:
      ---
      kind: destination
      spec:
        name: "postgresql"
        version: "v6.1.2"
        path: "cloudquery/postgresql"
        write_mode: "overwrite-delete-stale"
        spec:
          connection_string: ${PG_CONNECTION_STRING}
  dockerOptions:
    image: ghcr.io/cloudquery/cloudquery:latest
    entryPoint: [""]
  warningOnStdErr: false
  env:
    PG_CONNECTION_STRING: "postgresql://postgres:pass@host.docker.internal:5432/cloudquery?sslmode=disable"
  commands:
  - '/app/cloudquery sync {{ workingDir }}/config.yml --log-console'

We are using the Docker runner with a Bash task to run the cloudquery sync command. The inputFiles section allows us to pass a configuration file to the task. It is also possible to read this configuration file from disk or a remote location, but we will keep it simple for now.

The example places the Postgres connection string in an environment variable. In production, you should use a secret manager like Vault (opens in a new tab) to load the connection string into environment variables.

The AWS config is just an example. It is configured to sync all EC2 tables and their relations.

Kestra Flow Editor Screenshot

In our example config we are using the Postgres host host.docker.internal to connect to the database. This is a special hostname that Docker resolves to the host machine. Make sure to replace this with the hostname of your database if you are not running Postgres via Docker.

Step 3: Run the flow

With the config entered, click Save, then click New Execution. Click OK on the confirmation.

If everything was set up correctly, you should now see the sync running in the Executions tab. You can click on the execution to see the logs for any errors.

Kestra Flow Execution Screenshot

If you get an error related to config.yml not being found, try making the following change to the Kestra Docker-compose file to give the volume write access:

- /tmp/kestra-wd:/tmp/kestra-wd:rw

Step 4: Schedule the flow

To run the flow periodically, we can add a trigger to run it on a schedule. Back in the Flow editor, add the following section:

triggers:
  - id: schedule
    type: io.kestra.core.models.triggers.types.Schedule
    cron: "0 6 * * *"

This cron expression will run the flow every day at 06:00. You can use crontab.guru (opens in a new tab) to generate cron expressions for the destination you need and replace the one in the example above. Kestra also supports these special values for cron:

@yearly
@annually
@monthly
@weekly
@daily
@midnight
@hourly

With this in place, remember to click Save again. Your CloudQuery sync will now be run on a regular schedule.

Next steps

This tutorial was just a quick introduction to help you get started with a CloudQuery deployment on Kestra. You can now create additional Kestra tasks to perform transformations, send notifications and more. For more information, check out the CloudQuery docs and the Kestra docs (opens in a new tab). To productionize your Kestra deployment, you will likely need to deploy it to a cloud container environment, such as Kubernetes. For more information, see the Kestra Deployment with Kubernetes guide (opens in a new tab).