Orchestrating CloudQuery Syncs with Kestra
In this tutorial, we will show you how to run CloudQuery as a Kestra (opens in a new tab) flow, using the AWS source- and Postgresql destination plugins as an example. Kestra is an open source orchestration tool that allows you to schedule and monitor CloudQuery syncs.
Step 1: Install Kestra
Follow the Kestra Deployment with Docker guide (opens in a new tab) to run Kestra locally inside Docker containers.
When it's running, open http://localhost:8080 (opens in a new tab) in your browser.
Step 2: Set up a PostgreSQL database
We will use a PostgreSQL database as a destination for our CloudQuery syncs. You can use any PostgreSQL database, but for this tutorial we will use a Docker container.
docker run --name cloudquery-postgres -e POSTGRES_PASSWORD=pass -p 5432:5432 -d postgres
Step 2: Create a Kestra flow
Inside the Kestra UI, go to the Flows
tab and click on Create
. You can now create a new flow with the following content:
id: "cloudquery"
namespace: "io.kestra"
tasks:
- id: "bash"
type: "io.kestra.core.tasks.scripts.Bash"
runner: DOCKER
inputFiles:
config.yml: |
kind: source
spec:
name: aws
path: cloudquery/aws
version: "v22.15.2"
tables: ["aws_ec2*"]
destinations: ["postgresql"]
spec:
---
kind: destination
spec:
name: "postgresql"
version: "v6.1.2"
path: "cloudquery/postgresql"
write_mode: "overwrite-delete-stale"
spec:
connection_string: ${PG_CONNECTION_STRING}
dockerOptions:
image: ghcr.io/cloudquery/cloudquery:latest
entryPoint: [""]
warningOnStdErr: false
env:
PG_CONNECTION_STRING: "postgresql://postgres:pass@host.docker.internal:5432/cloudquery?sslmode=disable"
commands:
- '/app/cloudquery sync {{ workingDir }}/config.yml --log-console'
We are using the Docker runner with a Bash
task to run the cloudquery sync
command. The inputFiles
section allows us to pass a configuration file to the task. It is also possible to read this configuration file from disk or a remote location, but we will keep it simple for now.
The example places the Postgres connection string in an environment variable. In production, you should use a secret manager like Vault (opens in a new tab) to load the connection string into environment variables.
The AWS config is just an example. It is configured to sync all EC2 tables and their relations.
In our example config we are using the Postgres host host.docker.internal
to connect to the database. This is a special hostname that Docker resolves to the host machine. Make sure to replace this with the hostname of your database if you are not running Postgres via Docker.
Step 3: Run the flow
With the config entered, click Save
, then click New Execution
. Click OK
on the confirmation.
If everything was set up correctly, you should now see the sync running in the Executions
tab. You can click on the execution to see the logs for any errors.
If you get an error related to config.yml not being found, try making the following change to the Kestra Docker-compose file to give the volume write access:
- /tmp/kestra-wd:/tmp/kestra-wd:rw
Step 4: Schedule the flow
To run the flow periodically, we can add a trigger to run it on a schedule. Back in the Flow editor, add the following section:
triggers:
- id: schedule
type: io.kestra.core.models.triggers.types.Schedule
cron: "0 6 * * *"
This cron expression will run the flow every day at 06:00. You can use crontab.guru (opens in a new tab) to generate cron expressions for the destination you need and replace the one in the example above. Kestra also supports these special values for cron
:
@yearly
@annually
@monthly
@weekly
@daily
@midnight
@hourly
With this in place, remember to click Save
again. Your CloudQuery sync will now be run on a regular schedule.
Next steps
This tutorial was just a quick introduction to help you get started with a CloudQuery deployment on Kestra. You can now create additional Kestra tasks to perform transformations, send notifications and more. For more information, check out the CloudQuery docs and the Kestra docs (opens in a new tab). To productionize your Kestra deployment, you will likely need to deploy it to a cloud container environment, such as Kubernetes. For more information, see the Kestra Deployment with Kubernetes guide (opens in a new tab).