Alarm Threshold Evaluation

Alarm threshold evaluation logic capable of wide scaling

Upstream blueprint

Prerequisites

Install packstack --allinone, and also on an additional compute node.

Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default):

   sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml
   sudo service openstack-ceilometer-compute restart

Step 1.

Ensure the ceilometer-alarm-evaluator and ceilometer-alarm-notifier services are running on the controller node:

   sudo yum install -y openstack-ceilometer-alarm
   sudo openstack-config --set /etc/ceilometer/ceilometer.conf alarm evaluation_service ceilometer.alarm.service.PartitionedAlarmService 
   export CEILO_ALARM_SVCS='evaluator notifier'
   for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc restart; done

Step 2.

Ensure a second ceilometer-alarm-evaluator service is running on the compute node:

   sudo yum install -y openstack-ceilometer-alarm
   sudo openstack-config --set /etc/ceilometer/ceilometer.conf alarm evaluation_service ceilometer.alarm.service.PartitionedAlarmService
   export CEILO_ALARM_SVCS='evaluator'
   for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc start; done

Step 3.

Spin up an instance in the usual way:

   nova boot --image $IMAGE_ID --flavor 1 test_instance

Step 4.

Create multiple alarms with thresholds sufficiently low that they are guaranteed to go into alarm:

   for i in $(seq 10)
   do
     ceilometer alarm-threshold-create --name high_cpu_alarm_${i} --description 'instance running hot'  \
      --meter-name cpu_util  --threshold 0.01 --comparison-operator gt  --statistic avg \
      --period 60 --evaluation-periods 1 \
      --alarm-action 'log://' \
      --query resource_id=$INSTANCE_ID
   done

Step 5.

Ensure that the alarms are partitioned over the multiple evaluators:

   tail -f /var/log/alarm-evaluator.log | grep 'initiating evaluation cycle'
   

On each host, expect approximately half the alarms to be evaluated, i.e.

   '... initiating evaluation cycle on 5 alarms'

Step 6.

Ensure all alarms have transitioned to the 'alarm' state:

   ceilometer alarm-list

Step 7.

Create some more alarms:

   for i in $(seq 10)
   do
     ceilometer alarm-threshold-create --name low_cpu_alarm_${i} --description 'instance running cold'  \
      --meter-name cpu_util  --threshold 99.9 --comparison-operator le  --statistic avg \
      --period 60 --evaluation-periods 1 \
      --alarm-action 'log://' \
      --query resource_id=$INSTANCE_ID
   done

and also delete a few alarms:

   ceilometer delete-alarm -a $ALARM_ID

and ensure that the alarm allocation is still roughly even between the evaluation services:

   tail -f /var/log/alarm-evaluator.log | grep 'initiating evaluation cycle'

Step 8.

Shutdown the partitioned ceilometer alarm service on each host:

     sudo service openstack-ceilometer-alarm-evaluator stop

then restart on the controller host *only* with the singleton evaluator:

   sudo openstack-config --set /etc/ceilometer/ceilometer.conf alarm evaluation_service ceilometer.alarm.service.SingletonAlarmService 
   sudo service openstack-ceilometer-alarm-evaluator start

Step 9.

Reset all alarms to the 'ok' state and ensure that they flip back to 'alarm':

   for a in $(ceilometer alarm-list | grep _cpu_alarm_ | awk -F\| '{print $2}')
   do
     ceilometer alarm-update --state ok -a $a
   done
   
   sleep 60 ; ceilometer alarm-list