Do Math with AWS CloudWatch

We know AWS Cloudwatch: A very good monitoring service which we use to check metrics (or logs) of AWS resources. Personally I always used CloudWatch in its simplest way which is choosing a namespace, dimension and then the desired metric (AWS one or Custom) and playing with the time frame.

Sometimes it’s needed to do some more complex operation to monitor a situation instead of a simple metric. Especially when you want to create an alert. CloudWatch has this capability to do some math operations; you can find more information here but I will explain a use case which I faced recently. It’s related to long expected RabbitMQ broker of Amazon MQ.

Not long time ago, AWS announced availability of RabbitMQ as a broker of Amazon MQ service. RabbitMQ is very popular for distributed systems, so, a managed service from AWS will help DevOps teams a lot ūüôā

This is familiar to those who know RabbitMQ but in our use case we wanted to receive an alert when the rate of Acknowledging messages is considerably less than rate of publishing messages to queues. We came up with the following Cloudformation code to implement this alarm:

  AckRateAlarm:
    Type: 'AWS::CloudWatch::Alarm'
    Properties:
      AlarmDescription: Rate of Ack is considerably less than rate of publishing
      AlarmName: RabbitMQAckRateAlarm
      Metrics:
        - Id: a1
          Expression: "IF(ma1 > ma2 + 1000, 1, 0)"
          Label: "Ack rate vs Publish rate"
        - Id: ma1
          MetricStat:
            Metric: 
              MetricName: PublishRate
              Namespace: AWS/AmazonMQ
              Dimensions:
                - Name: Broker
                  Value: Foo
            Period: 300
            Stat: Average
          ReturnData: false
        - Id: ma2
          MetricStat:
            Metric: 
              MetricName: AckRate
              Namespace: AWS/AmazonMQ
              Dimensions:
                - Name: Broker
                  Value: Foo
            Period: 300
            Stat: Average
          ReturnData: false
      EvaluationPeriods: '2'
      Threshold: 0
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - SNS_ARN
      OKActions:
        - SNS_ARN

This use case sounds simple but I think shows the benefit of using Math in CloudWatch metrics well.

AWS ECS CloudFormation Timeout

This is more an informational post that may help others to feel less miserable in the same situation as I was! The scenario is this:

You are updating an ECS cluster via AWS CloudFormation but for whatever reason the cluster doesn’t stabilize. So, you see the stack is in UPDATE_IN_PROGRESS¬†state and you don’t receive any message in CloudFormation¬†Events¬†page. If you can’t troubleshoot the issue with ECS and take no action, It will take 3 hours before CloudFormation timeouts and display a message! At this point, as you can guess, CloudFormation will rollback. Situation can be even worse if Rollback can not be proceeded successfully (in our case, there was a lack of resources preventing update and rollback). Again, CloudFormation will stuck in UPDATE_ROLLBACK_IN_PROGRESS¬†state and will timeout after 3 hours! In a conversation I had with AWS support, they said this time is hard-coded and can’t be changed at the moment!

So, in such a situation: Keep Calm And Troubleshoot!