Important part of security that we (infosec guys) often delegate :-) to the Operation teams(NOC) is Availability.
For the IaaS service provider (Amazon AWS) is responsible for Infrastructure availability, but we must design all layers above ( Availability Zones, VPCs, Networks, Instances and LB ) for high availability or at least fault tolerance. One of the most important step in this process is actually detection IaS failure.
From AWS:
"With instance status monitoring, you can quickly determine whether Amazon EC2 has detected any problems that might prevent your instances from running applications. Amazon EC2 performs automated checks on every running EC2 instance to identify hardware and software issues. You can view the results of these status checks to identify specific and detectable problems."
Below simple python script that will help you to configure status check alarms for all you running instances:
#!/usr/bin/python
import boto3
import pprint
boto3.setup_default_session(profile_name='staging', region_name='eu-west-1')
ec2 = boto3.resource('ec2')
cloudwatch=boto3.resource('cloudwatch')
# Getting all running instances
instance_iterator = ec2.instances.all()
for instance in instance_iterator:
instance_name = "unnamed"
for tag in instance.tags:
if tag['Key'] == "Name":
instance_name = tag['Value']
print instance_name, instance.id
if instance.state["Name"] == "running" :
metric = cloudwatch.Metric("AWS/EC2", "StatusCheckFailed")
response = metric.put_alarm(
AlarmName = instance.id + "/" + instance_name + "-status-alarm",
AlarmDescription = 'status check for %s %s' % (instance.id, instance_name),
ActionsEnabled = True,
OKActions = ["arn:aws:sns:eu-west-1:your_account_id:YOUR_SNS-EmailSMS-Notification"],
AlarmActions = ["arn:aws:sns:eu-west-1:your_account_id:YOUR_SNS-EmailSMS-Notification"],
Statistic = "Maximum",
Dimensions = [{'Name': 'InstanceId', 'Value': instance.id}],
Period = 60,
EvaluationPeriods = 2,
Threshold = 1.0,
ComparisonOperator = "GreaterThanOrEqualToThreshold"
)
pprint.pprint(response)
For the IaaS service provider (Amazon AWS) is responsible for Infrastructure availability, but we must design all layers above ( Availability Zones, VPCs, Networks, Instances and LB ) for high availability or at least fault tolerance. One of the most important step in this process is actually detection IaS failure.
From AWS:
"With instance status monitoring, you can quickly determine whether Amazon EC2 has detected any problems that might prevent your instances from running applications. Amazon EC2 performs automated checks on every running EC2 instance to identify hardware and software issues. You can view the results of these status checks to identify specific and detectable problems."
Below simple python script that will help you to configure status check alarms for all you running instances:
#!/usr/bin/python
import boto3
import pprint
boto3.setup_default_session(profile_name='staging', region_name='eu-west-1')
ec2 = boto3.resource('ec2')
cloudwatch=boto3.resource('cloudwatch')
# Getting all running instances
instance_iterator = ec2.instances.all()
for instance in instance_iterator:
instance_name = "unnamed"
for tag in instance.tags:
if tag['Key'] == "Name":
instance_name = tag['Value']
print instance_name, instance.id
if instance.state["Name"] == "running" :
metric = cloudwatch.Metric("AWS/EC2", "StatusCheckFailed")
response = metric.put_alarm(
AlarmName = instance.id + "/" + instance_name + "-status-alarm",
AlarmDescription = 'status check for %s %s' % (instance.id, instance_name),
ActionsEnabled = True,
OKActions = ["arn:aws:sns:eu-west-1:your_account_id:YOUR_SNS-EmailSMS-Notification"],
AlarmActions = ["arn:aws:sns:eu-west-1:your_account_id:YOUR_SNS-EmailSMS-Notification"],
Statistic = "Maximum",
Dimensions = [{'Name': 'InstanceId', 'Value': instance.id}],
Period = 60,
EvaluationPeriods = 2,
Threshold = 1.0,
ComparisonOperator = "GreaterThanOrEqualToThreshold"
)
pprint.pprint(response)