Announcing Closed Beta of CloudQuery AWS Plugin with Event-based Sync

September 6, 2023

Michal Brutvan
Name
Michal Brutvan
Twitter
pilvikala

We are excited to announce the preview of an event-based sync for our AWS Plugin!

What is it?

CloudQuery was initially designed to sync all data on demand, giving the user full control over what to sync, when, and how often. This approach makes CloudQuery easy to set up and use.

However, regular sync, even if it runs every hour, is sometimes just not enough to get the accurate picture of what is happening in your environment. Cloud environments are ephemeral - they come and go in just a few minutes and it is really hard to track them and get your costs right. An IP address gets spammed by bots the moment you make it public. User accounts get created with broad permissions and can get misused in a brief moment.

This is where our new event-based sync comes to the rescue.

How it works

All events are aggregated by AWS CloudTrail. You can configure a Trail to send management events to a Kinesis Data Stream via CloudWatch Logs. By subscribing to a stream of AWS CloudTrail events in the Kinesis Data stream, CloudQuery can then trigger selective syncs to update just the singular resource that had a configuration change.

Configuring CloudQuery AWS Plugin for event-based sync Configuring CloudQuery AWS Plugin for event-based sync

With this setup, you get the fresh data within a few seconds of it becoming available in CloudTrail.

Here are the services and events supported at the moment:

ServiceEvent
ec2.amazonaws.comAssociateRouteTable
ec2.amazonaws.comAttachInternetGateway
ec2.amazonaws.comAuthorizeSecurityGroupEgress
ec2.amazonaws.comAuthorizeSecurityGroupIngress
ec2.amazonaws.comCreateImage
ec2.amazonaws.comCreateInternetGateway
ec2.amazonaws.comCreateNetworkInterface
ec2.amazonaws.comCreateSecurityGroup
ec2.amazonaws.comCreateSubnet
ec2.amazonaws.comCreateTags
ec2.amazonaws.comCreateVpc
ec2.amazonaws.comDeleteTags
ec2.amazonaws.comDetachInternetGateway
ec2.amazonaws.comModifySubnetAttribute
ec2.amazonaws.comRevokeSecurityGroupEgress
ec2.amazonaws.comRevokeSecurityGroupIngress
ec2.amazonaws.comRunInstances
iam.amazonaws.comCreateGroup
iam.amazonaws.comCreateRole
iam.amazonaws.comCreateUser
rds.amazonaws.comCreateDBCluster
rds.amazonaws.comCreateDBInstance
rds.amazonaws.comModifyDBCluster
rds.amazonaws.comModifyDBInstance

Getting Started

This feature is available in closed beta, sign up for early access (opens in a new tab).

  1. Configure an AWS CloudTrail Trail to send management events to a Kinesis Data Stream via CloudWatch Logs. The most straight forward way to do this is to use the CloudQuery provided CloudFormation template.
aws cloudformation deploy --template-file ./streaming-deployment.yml --stack-name <STACK-NAME> --capabilities CAPABILITY_IAM --disable-rollback --region <DESIRED-REGION>
  1. Copy the ARN of the Kinesis stream. If you used the CloudFormation template you can run the following command:
aws cloudformation describe-stacks --stack-name <STACK-NAME> --query "Stacks[].Outputs" --region <DESIRED-REGION>
  1. Define a config.yml file like the one below
kind: source
spec:  
	name: "aws-event-based"  
	registry: "local"  
	path: <PATH/TO/BINARY>
	tables:
		- aws_ec2_instances
		- aws_ec2_internet_gateways
		- aws_ec2_security_groups
		- aws_ec2_subnets
		- aws_ec2_vpcs
		- aws_ecs_cluster_tasks
		- aws_iam_groups
		- aws_iam_roles
		- aws_iam_users
		- aws_rds_instances
destinations: ["postgresql"]  
	skip_tables:    
		- aws_iam_group_last_accessed_details    
		- aws_iam_role_last_accessed_details    
		- aws_iam_user_last_accessed_details  
	spec:    
		event_based_sync:      
			- account:          
					local_profile: "<ROLE-NAME>"
				kinesis_stream_arn: "<OUTPUT-FROM-CLOUDFORMATION-STACK>"
  1. Sync the data!
cloudquery sync config.yml

This will start a long lived process that will only stop when there is an error or you stop the process.

Deploying in production

CloudQuery needs to run in a listening mode as a long-running service. In this mode, it does not support the overwrite-delete-stale write model. To delete stale data, you need to set up a recurrent task to run full table syncs (opens in a new tab). Additionally, you may need to set up another task with CloudQuery still running regular sync on tables that are currently not supported for the event-based sync. See the AWS Plugin documentation for the list of supported tables.

Note that these are the limitations of the current beta version of the event-based sync for our AWS plugin. We plan to make configuration and management easier in the future based on user feedback.

Availability

This feature is currently available in early access to everyone upon signing up (opens in a new tab). When released to the public, it will be a premium (paid) feature, with the actual pricing model yet to be defined.

Future work

At the moment, only one Kinesis stream is supported by a running instance of CloudQuery. We will consider adding support for multiple streams based on the feedback we receive.

The current coverage of tables has been designed to provide a selection of different services. We will add more resources based on your feedback.

Read the docs