Tuesday, May 4, 2021

Using AWS Config, Lambda and Splunk to build detective controls for AWS.

The key AWS document that helps clients to succeed in Cloud Adoption -  AWS Cloud Adoption Framework, in the Security Perspective section defines Detective Controls as following:

  • Detective Control provides guidance to help identify potential security incidents within your AWS environment.
AWS Well Architect Framework adds  to that:

  •  You can use detective controls to identify a potential security threat or incident. They are an essential part of governance frameworks and can be used to support a quality process, a legal or compliance obligation, and for threat identification and response efforts. 
  • In AWS, you can implement detective controls by processing logs, events, and monitoring that allows for auditing, automated analysis, and alarming

Below I will describe implementations of the AWS Detective controls using native AWS and 3d party services :
  • AWS Config and Config Rules (Managed and Custom)
  • Lambda 
  • CloudWatch Events a.k.a EventBridge 
  • Firehose
  • Splunk Cloud

Our Goal is: Build a set of the automated detective controls for the multi-account distributed AWS environment, along with automatic remediation, compliance dashboards, a single pane of glass for security events, and notifications

Let's start with collecting our requirements:
  • All components of the solution must be represented as a code 
  • Serverless Application Model
  • Maximum usage of the native AWS services 
  • 3d party components should be pluggable
  • fully distributed architecture with no critical central components.
  • Event-driven architecture
  • Near-real time event processing and ingestion
  • Resource whitelisting (via ARN or/and resource tag) support

Major components:
  • AWS Config Service
  • Managed (by AWS) config rules 
  • Custom (Lambdas, created by customer) config rules
  • EventBridge event rule
  • Processing Lambda
  • Firehose (delivery2Splunk)
  • S3 Buckets
  • IAM Roles
  • DynamoDB tables
  • Splunk with HEC configured
  • AWS Security Hub Service 

Event flow and processing
  1. AWS Config rule evaluation starts due to: 
    • AWS resource in the scope of the config rule being created/modified/deleted
    • new config rule has been deployed
    • schedule-driven rule evaluation started
    • on-demand evaluation of the rules has been triggered via Web UI, API or CLI
  2. Rule evaluation completed and resource compliance status is changed to COMPLIANT | NON_COMPLIANT | NOT_APPLICABLE 
  3. Event  ComplianceChangeNotification generated: it generates when the compliance type of a resource that AWS Config evaluates has changed
  4. EventBridge event rule will invoke processing Lambda.
  5. Processing Lambda will:
    • extract all required fields from the EventBridge event.
    • create a data structure that suitable for the Splunk ingestion via Splink HEC
    • add custom Splunk fields that could be defined at index time with AWS account metadata: AWS Account ID, Account Name,  AWS organization, environment, customer, etc
    • Enrich this data structure with information from the central config rule metadata DynamoDB table: Rule severity, Description, associated compliance framework name and section, etc. 
    • To fetch information from the DynamodB, processing Lambda will assume a role in the "Security/Log Archive" account that will grant only "Read" access to the required DynamoDB tables.
    • Call AWS config service to retrieve additional information about AWS resource, which compliance status has been changed, such as recourse name, ARN, all available Tags, etc
    • Enrich existing event data structure with information obtained from the config.
    • Fetch resource whitelisting status (whitelisted or not, the reason for whitelisting, whitelisted by whom and when) from the central DynamoDB table of the whitelisted resource using resource particular resource Tag and /or resource ARN.
    • Enrich existing event data structure with whitelisting information.
    • Send the event, enriched on previous steps, to the Kinesis Data Firehose configured with Splunk HEC as destination and central S3 bucket as backup storage.
    • Build a new data structure that corresponds to the AWS Security Finding Format (ASFF)
    • Adjust event severity based on the whitelisting status
    • Fetch additional security context from different AWS services that might affect security event severity and incorporate it into the ASFF data structure.
    • Send ASFF event to the Security Hub
  6. Kinesis Data Firehose delivers the event to the Splunk HTTP Event Collector (HEC) endpoint for the indexing.
  7. SecurityHub in the administrator(master) account will serve as a single pane of glass for the Global Security team along with Splunk.
  8. Security Hub in the member account could be useful for the customer(the account owner) to monitor and address security findings in his account.  

of the solution:
  • event-driven via EventBridge
  • uses compliance status change as a trigger to process event
  • distributed (AWS account and region) processing Lambda
  • processing leverages AWS Config Service to extract additional information about the resource itself (ARN, all Tags, resource name, etc)
  • verifies the whitelisting status of the resource  (by ARN or dedicated Tag) using ReadOnly access to secure centralized DB
  • obtains additional details about the config rule, that triggered resource compliance status change, from the centralized DynamoDB table: AWS config rule enable/disable status,  severity, event routing information(2Splunk, 2SecurityHub, 2PagerDuty), rule details, etc.
  • enriches data event with information obtained from the whitelisting and config rules metadata tables
  • automatically adjust event severity based on the whitelisting status
  • send the enriched event to Splunk via Firehose
  • send the enriched event to the Security Hub using AWS Security Finding Format (ASFF)
  • architecture could be extended to accommodate auto-remediation flow
  • could serve as an integration point for any 3d party logging/alerting or ticketing tool.