PDA

View Full Version : OpenVZ script to periodically check & suspend/restart overloaded VMs (VPSs)



Fli
08-31-2014, 11:01 AM
Sometimes happen that an OpenVZ VPS / Virtual machine / container is overloaded because of some abusive script or Denial of service kind attack. This can negativelly influence also other VPSs/containers so the script provided on this forum page has aim to check load averages of VPSes and if some VPS load average too high, it disconnect VPS by removing its IP (nullrouting?). Then if load average still dont go down within set time, then VPS is restarted + email sent to admin. After restart, IP is added back to VPS. This script can be run on cronjob to prevent OpenVZ server be overloaded.

script should be placed and named as: /root/vmsuspender , if want another name/location, edit this in script code

Version 1.0: OUTDATED, infinitelly checks untill VM load goes under set value, if that dont happen script wont STOP.


echo "Script to check VMs (VPSs) load averages and do some action with VMecho "/root/vmsuspender" | mail -s test [email protected]
"


# set maximum VM (VPS) load average in whole numbers (20,55,99), if reached an action will be triggered
maxload=30


# load average when re-adding IP is done
maxload2=15


# set sleep time between removing and re-adding VM IP address
sleeptime=40


for ctid in $(vzlist -Ho ctid);do


# echo $ctid


# get load average for a CTID
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
# round load average to whole number
vmload=$(printf "%.0f" $vmload)


# if vm load is higher tha $maxload, do action
if [ "$vmload" -gt "$maxload" ];then


echo "$ctid load is higher than $maxload, its BAD.. lets do some action with that VM.."


# action on vm high load
vmip=$(vzlist -Ho ctid,ip | grep $ctid | awk '{print $2}')


echo "Deleting VM IP $vmip, and sleeping $sleeptime seconds..."
vzctl set $ctid --ipdel $vmip --save
#vzctl exec $ctid restart httpd
#vzctl exec $ctid service httpd restart




# lets wait until VM load goes down after removing IP
while [ "$vmload" -gt "$maxload2" ];do


sleep 3
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
vmload=$(printf "%.0f" $vmload)
echo "Sleeping until VM$ctid load goes under $maxload2. Current: $vmload"


done


echo "$ctid load is probably at acceptable value, lets add IP again"


echo "Adding VM IP $vmip"
vzctl set $ctid --ipadd $vmip --save


fi


done




Version 1.1: Added possibility to restart VM if its load average dont goes down within set time after removing VM IP + email openvz admin..


echo "Script to check VMs (VPSs) load averages and remove VM IP if load average too high, then if load average wont go down VM will be restarted"

# set email address where reports will be sent
[email protected]


# set maximum VM (VPS) load average in whole numbers (20,55,99), if reached, VM IP will be removed
maxload=30


# load average acceptable for re-adding VM IP back
maxload2=15


# Number of VMs load checks. One check per 5 seconds. If check number exceeded and load is still high after removing IP, we restart the VPS. 24x5=120seconds
looptimes=24


# number of seconds delay between load average checks (multiply this value by $looptimes and we have time in seconds to give Vm to decrease its load before we restart it)
loopsec=5


# --------------------------------------


for ctid in $(vzlist -Ho ctid);do


# get load average for a CTID
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
# round load average to whole number
vmload=$(printf "%.0f" $vmload)


# if vm load is higher tha $maxload, do action
if [ "$vmload" -gt "$maxload" ];then


echo "$ctid load is higher than $maxload, its BAD.. lets do some action with that VM.."


# action on vm high load
vmip=$(vzlist -Ho ctid,ip | grep $ctid | awk '{print $2}')


echo "Deleting VM IP $vmip ..."
vzctl set $ctid --ipdel $vmip --save
#vzctl exec $ctid restart httpd
#vzctl exec $ctid service httpd restart




# After removing VM IP, lets wait until its load average goes down
loopnumber=1
while [ "$vmload" -gt "$maxload2" ];do


loopnumber=$((loopnumber+1))
sleep 5


# getting current load average
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
vmload=$(printf "%.0f" $vmload)
echo "Sleeping until VM$ctid load goes under $maxload2. Currently: $vmload. This is check number $(($loopnumber - 1))/$looptimes"


# exit this loop if looped maximum allowed times
if [ "$loopnumber" -gt "$looptimes" ];then
echo "Load average $vmload is still too high \(above maxload2: $maxload2\), even after $(($looptimes * $loopsec)) seconds of removing VM IP $vmip. So lets restart VM to clear processes."
vzctl restart $ctid
echo "/root/vmsuspender script runs as a cronjob and it had to restart VPS $ctid.


This VPS load average was higher than $maxload and even VMSuspender removed VPS IP $vmip for $(($looptimes * $loopsec)) seconds, load average did not decreased under $maxload2" | mail -s "$(hostname): VPS was auto restarted" $adminmail
break
fi


# Load checking loop beore re-adding IP finished
done


echo "$ctid load is probably at acceptable value, lets add IP again"


echo "Adding VM IP $vmip"
vzctl set $ctid --ipadd $vmip --save


# end of VMs high load issue
fi


# end of one VMs load check
done
BUGs: i faced issue an VM IP was not re-added and VPS was left without IP assigned. vzctl log shown

Version #1.2 if VM has high load, but no IP, we assume another instance of vmsuspender script working on VM so we skip to checking another VMs load in "for" loop


echo "Script to check VMs (VPSs) load averages and remove VM IP if load average too high, then if load average wont go down VM will be restarted"

# set email address where reports will be sent
[email protected]


# set maximum VM (VPS) load average in whole numbers (20,55,99), if reached, VM IP will be removed
maxload=30


# load average acceptable for re-adding VM IP back
maxload2=15


# Number of VMs load checks. One check per 5 seconds. If check number exceeded and load is still high after removing IP, we restart the VPS. 24x5=120seconds
looptimes=24


# number of seconds delay between load average checks (multiply this value by $looptimes and we have time in seconds to give Vm to decrease its load before we restart it)
loopsec=5


# --------------------------------------


for ctid in $(vzlist -Ho ctid);do


# get load average for a CTID
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
# round load average to whole number
vmload=$(printf "%.0f" $vmload)


# if vm load is higher tha $maxload, do action
if [ "$vmload" -gt "$maxload" ];then


echo "$ctid load is higher than $maxload, its BAD.. lets do some action with that VM.."


# action on vm high load
vmip=$(vzlist -Ho ctid,ip | grep $ctid | awk '{print $2}')


echo "Continue to next VM load check if VM IP is empty, assuming another vmsuspender script is working with VM"
if [ "$vmip" == "" ];then
result="VM $ctid dont have IP, but its load is high: $vmload. Maybe some other vmsuspender script removed its IP and working on VM now. Lets continue to check load average of next VM.."
echo "$result"
echo "$result" | mail -s "$(hostname): $ctid has high load, no IP" $adminmail
echo "Ending this FOR loop and continuing with next VM.."
continue
fi


echo "Deleting VM IP $vmip ..."
vzctl set $ctid --ipdel $vmip --save
#vzctl exec $ctid restart httpd
#vzctl exec $ctid service httpd restart




# After removing VM IP, lets wait until its load average goes down
loopnumber=1
while [ "$vmload" -gt "$maxload2" ];do


loopnumber=$((loopnumber+1))
sleep 5


# getting current load average
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
vmload=$(printf "%.0f" $vmload)
echo "Sleeping until VM$ctid load goes under $maxload2. Currently: $vmload. This is check number $(($loopnumber - 1))/$looptimes"


# exit this loop if looped maximum allowed times
if [ "$loopnumber" -gt "$looptimes" ];then
echo "Load average $vmload is still too high \(above maxload2: $maxload2\), even after $(($looptimes * $loopsec)) seconds of removing VM IP $vmip. So lets restart VM to clear processes."
vzctl restart $ctid
echo "/root/vmsuspender script runs as a cronjob and it had to restart VPS $ctid.


This VPS load average was higher than $maxload and even VMSuspender removed VPS IP $vmip for $(($looptimes * $loopsec)) seconds, load average did not decreased under $maxload2" | mail -s "$(hostname): VPS was auto restarted" $adminmail
break
fi


# Load checking loop beore re-adding IP finished
done


echo "$ctid load is probably at acceptable value, lets add IP again"


echo "Adding VM IP $vmip"
vzctl set $ctid --ipadd $vmip --save


# end of VMs high load issue
fi


# end of one VMs load check
done

Version # 1.3: An email is sent to admin with processes running on overloaded VM, particular VM can be excluded from suspending/restarting


echo "Script to check VMs (VPSs) load averages and remove VM IP if load average too high, then if load average wont go down VM will be restarted"

# set email address where reports will be sent
[email protected]


# set maximum VM (VPS) load average in whole numbers (20,55,99), if reached, VM IP will be removed
maxload=30


# load average acceptable for re-adding VM IP back
maxload2=10


# Number of VMs load checks. One check per 5 seconds. If check number exceeded and load is still high after removing IP, we restart the VPS. 24x5=120seconds
looptimes=36


# number of seconds delay between load average checks (multiply this value by $looptimes and we have time in seconds to give Vm to decrease its load before we restart it)
loopsec=5


# --------------------------------------


for ctid in $(vzlist -Ho ctid);do
# whitelist some VM to never be suspended
if [ "$ctid" == "860" ];then
continue
fi


# get load average for a CTID
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
# round load average to whole number
vmload=$(printf "%.0f" $vmload)


# if vm load is higher tha $maxload, do action
if [ "$vmload" -gt "$maxload" ];then


echo "$ctid load is higher than $maxload, its BAD.. lets do some action with that VM.."


# get that VM IP
vmip=$(vzlist -Ho ctid,ip | grep $ctid | awk '{print $2}')


echo "Continue to next VM load check if VM IP is empty, assuming another vmsuspender script is working with VM"
if [ "$vmip" == "" ];then
result="VM $ctid dont have IP, but its load is high: $vmload. Maybe some other vmsuspender script removed its IP and working on VM now. Lets continue to check load average of next VM.."
echo "$result"
echo "$result" | mail -s "$(hostname): $ctid has high load, no IP" $adminmail
echo "Ending this FOR loop and continuing with next VM.."
continue
fi


# mail overloaded VM process list to admin to see what is going on
vzctl exec $ctid ps auxf | mail -s "$(hostname) VPS $ctid process list during high load" $adminmail


echo "Deleting VM IP $vmip ..."
vzctl set $ctid --ipdel $vmip --save
#vzctl exec $ctid restart httpd
#vzctl exec $ctid service httpd restart




# After removing VM IP, lets wait until its load average goes down
loopnumber=1
while [ "$vmload" -gt "$maxload2" ];do


loopnumber=$((loopnumber+1))
sleep 5


# getting current load average
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
vmload=$(printf "%.0f" $vmload)
echo "Sleeping until VM$ctid load goes under $maxload2. Currently: $vmload. This is check number $(($loopnumber - 1))/$looptimes"


# restart VM if looped maximum allowed times
if [ "$loopnumber" -gt "$looptimes" ];then
echo "Load average $vmload is still too high \(above maxload2: $maxload2\), even after $(($looptimes * $loopsec)) seconds of removing VM IP $vmip. So lets restart VM to clear processes."
vzctl restart $ctid
echo "/root/vmsuspender script runs as a cronjob and it had to restart VPS $ctid.


This VPS load average was higher than $maxload and even VMSuspender removed VPS IP $vmip for $(($looptimes * $loopsec)) seconds, load average did not decreased under $maxload2 . $(ps auxf)" | mail -s "$(hostname): VPS was auto restarted" $adminmail
break
fi


# Load checking loop before re-adding IP finished
done


echo "$ctid load is probably at acceptable value, lets add IP again"


echo "Adding VM IP $vmip"
vzctl set $ctid --ipadd $vmip --save


# end of VMs high load issue
fi


# end of one VMs load check
done

Version # 1.4: if VPS cant be restarted because of "Child * exited with status 7 (http://internetlifeforum.com/reseller-hosting/1868-openvz-issues-fixes-after-node-reboot/)" error, VPS checkpoint is also killed to allow restart.


echo "Script to check VMs (VPSs) load averages and remove VM IP if load average too high, then if load average wont go down VM will be restarted"

# set email address where reports will be sent
[email protected]

# set maximum VM (VPS) load average in whole numbers (20,55,99), if reached, VM IP will be removed
maxload=10

# load average acceptable for re-adding VM IP back
maxload2=3

# Number of VMs load checks. One check per 5 seconds. If check number exceeded and load is still high after removing IP, we restart the VPS. 24x5=120seconds
looptimes=30

# number of seconds delay between load average checks (multiply this value by $looptimes and we have time in seconds to give Vm to decrease its load before we restart it)
loopsec=5

# --------------------------------------

for ctid in $(vzlist -Ho ctid);do
# whitelist some VM to never be suspended
if [ "$ctid" == "860" ] || [ "$ctid" == "9999" ];then
continue
fi

# get load average for a CTID
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
# round load average to whole number
vmload=$(printf "%.0f" $vmload)

# if vm load is higher tha $maxload, do action
if [ "$vmload" -gt "$maxload" ];then

echo "$ctid load is higher than $maxload, its BAD.. lets do some action with that VM.."

# get that VM IP
vmip=$(vzlist -Ho ctid,ip | grep $ctid | awk '{print $2}')

# If VM IP is zero, we assume another vmsuspender script is working with VM, we quit
if [ "$vmip" == "" ];then
result="VM $ctid dont have IP, but its load is high: $vmload. Maybe some other vmsuspender script removed its IP and working on VM now. Lets continue to check load average of next VM.."
echo "$result"
echo "$result" | mail -s "$(hostname): $ctid has high load, no IP" $adminmail
echo "Ending this FOR loop and continuing with next VM.."
continue
fi

# mail overloaded VM process list to admin to see what is going on
#vzctl exec $ctid ps auxf | mail -s "$(hostname) VPS $ctid process list during high load" $adminmail
mkdir /root/vmsloadprocesses
mkdir /root/vmsloadprocesses/$ctid
vzctl exec $ctid ps auxf > /root/vmsloadprocesses/$ctid/high_load_processes

echo "Deleting VM IP $vmip ..."
vzctl set $ctid --ipdel $vmip
#vzctl exec $ctid restart httpd
#vzctl exec $ctid service httpd restart


# After removing VM IP, lets wait until its load average goes down
loopnumber=1
while [ "$vmload" -gt "$maxload2" ];do

loopnumber=$((loopnumber+1))
sleep 5

# getting current load average
vmload=$(vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
vmload=$(printf "%.0f" $vmload)
echo "Sleeping until VM$ctid load goes under $maxload2. Currently: $vmload. This is check number $(($loopnumber - 1))/$looptimes"

# restart VM if looped maximum allowed times
if [ "$loopnumber" -gt "$looptimes" ];then
echo "Load average $vmload is still too high \(above maxload2: $maxload2\), even after $(($looptimes * $loopsec)) seconds of removing VM IP $vmip. So lets restart VM to clear processes."
echo "Killing any process containing vmid, so restart succeed (malicious finder can be hanging vps):"
pkill -f "/vz/root/$ctid/"
echo "Check if VM restart command outputted exited with status 7, if yes, kill vm checkpoint also:"
vmrestartt=$(vzctl restart $ctid)
if [[ "$vmrestartt" == *"exited with status 7"* ]];then
vzctl chkpnt $ctid --kill
sleep 10
vzctl --verbose restart $ctid
fi
echo "/root/vmsuspender script runs as a cronjob and it had to restart VPS $ctid.

This VPS load average was higher than $maxload and even VMSuspender removed VPS IP $vmip for $(($looptimes * $loopsec)) seconds, load average did not decreased under $maxload2 . $(cat /root/vmsloadprocesses/$ctid/high_load_processes)" | mail -s "$(hostname): VPS was auto restarted" $adminmail
break
fi

# Load checking loop before re-adding IP finished
done

echo "$ctid load is probably at acceptable value, lets add IP again"

echo "Adding VM IP $vmip"
vzctl set $ctid --ipadd $vmip --save

# end of VMs high load issue
fi

# end of one VMs load check
done

Version # 1.4.1: vzlist and vzctl changed to absolute path (example: /usr/sbin/vzlist)


# Script to check VMs (VPSs) load averages and remove VM IP if load average too high, then if load average wont go down VM will be restarted

# set email address where reports will be sent
[email protected]

# set maximum VM (VPS) load average in whole numbers (20,55,99), if reached, VM IP will be removed
maxload=10

# load average acceptable for re-adding VM IP back
maxload2=3

# Number of VMs load checks. One check per 5 seconds. If check number exceeded and load is still high after removing IP, we restart the VPS. 24x5=120seconds
looptimes=40

# number of seconds delay between load average checks (multiply this value by $looptimes and we have time in seconds to give Vm to decrease its load before we restart it)
loopsec=5

# --------------------------------------

for ctid in $(/usr/sbin/vzlist -Ho ctid);do
# whitelist some VM to never be suspended
if [ "$ctid" == "860" ] || [ "$ctid" == "9999" ];then
continue
fi

# get load average for a CTID
vmload=$(/usr/sbin/vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
# round load average to whole number
vmload=$(printf "%.0f" $vmload)

# if vm load is higher tha $maxload, do action
if [ "$vmload" -gt "$maxload" ];then

echo "$ctid load is higher than $maxload, its BAD.. lets do some action with that VM.."

# get that VM IP
vmip=$(/usr/sbin/vzlist -Ho ctid,ip | grep $ctid | awk '{print $2}')

# If VM IP is zero, we assume another vmsuspender script is working with VM, we quit
if [ "$vmip" == "" ];then
result="VM $ctid dont have IP, but its load is high: $vmload. Maybe some other vmsuspender script removed its IP and working on VM now. Lets continue to check load average of next VM.."
echo "$result"
echo "$result" | mail -s "$(hostname): $ctid has high load, no IP" $adminmail
echo "Ending this FOR loop and continuing with next VM.."
continue
fi

# mail overloaded VM process list to admin to see what is going on
#vzctl exec $ctid ps auxf | mail -s "$(hostname) VPS $ctid process list during high load" $adminmail
mkdir /root/vmsloadprocesses
mkdir /root/vmsloadprocesses/$ctid
/usr/sbin/vzctl exec $ctid ps auxf > /root/vmsloadprocesses/$ctid/high_load_processes

echo "Deleting VM IP $vmip ..."
/usr/sbin/vzctl set $ctid --ipdel $vmip
#vzctl exec $ctid restart httpd
#vzctl exec $ctid service httpd restart


# After removing VM IP, lets wait until its load average goes down
loopnumber=1
while [ "$vmload" -gt "$maxload2" ];do

loopnumber=$((loopnumber+1))
sleep 5

# getting current load average
vmload=$(/usr/sbin/vzlist -Ho ctid,laverage | grep $ctid | awk '{print $2}' | cut -c-5 | tr -d /)
vmload=$(printf "%.0f" $vmload)
echo "Sleeping until VM$ctid load goes under $maxload2. Currently: $vmload. This is check number $(($loopnumber - 1))/$looptimes"

# restart VM if looped maximum allowed times
if [ "$loopnumber" -gt "$looptimes" ];then
echo "Load average $vmload is still too high \(above maxload2: $maxload2\), even after $(($looptimes * $loopsec)) seconds of removing VM IP $vmip. So lets restart VM to clear processes."
echo "Killing any process containing vmid, so restart succeed (malicious finder can be hanging vps):"
pkill -f "/vz/root/$ctid/"
echo "Check if VM restart command outputted exited with status 7, if yes, kill vm checkpoint also:"
vmrestartt=$(/usr/sbin/vzctl restart $ctid)
if [[ "$vmrestartt" == *"exited with status 7"* ]];then
/usr/sbin/vzctl chkpnt $ctid --kill
sleep 10
/usr/sbin/vzctl --verbose restart $ctid
fi
echo "/root/vmsuspender script runs as a cronjob and it had to restart VPS $ctid.

This VPS load average was higher than $maxload and even VMSuspender removed VPS IP $vmip for $(($looptimes * $loopsec)) seconds, load average did not decreased under $maxload2 . $(cat /root/vmsloadprocesses/$ctid/high_load_processes)" | mail -s "$(hostname): VPS was auto restarted" $adminmail
break
fi

# Load checking loop before re-adding IP finished
done

echo "$ctid load is probably at acceptable value, lets add IP again"

echo "Adding VM IP $vmip"
/usr/sbin/vzctl set $ctid --ipadd $vmip --save

# end of VMs high load issue
fi

# end of one VMs load check
done

The above script can be run by cron example each 2 minutes to check load averages...
How tos etup such a cron? Example by creating file /etc/cron.d/mycrons

and adding into it:


SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
[email protected]
HOME=/


* * * * * root /bin/sh /root/vmsuspender >/dev/null 2>&1

After u see cronjobs are executed, change "[email protected]" to "root"

Please if you find ideas on how to improve this script, or you make another versions of this script, please share it in this topic for future use. Thank you