PDA

View Full Version : OpenVZ issues & fixes after node reboot



Fli
04-29-2014, 10:23 AM
This post shows workarounds how to start/restart OpenVZ VM when its unable to start, stop, restart. Use Ctrl+F to find on this page error message you got.
I dont guarantee anything will work for you, it just worked for me.

What happend: websites stopped loading, and OpenVZ VPS (VM-virtual machine) load was like 100.00 when i did following command on OpenVZ host server:


vzlist -o ctid,laverage,ip

so tried several commands, increase ram, but dont works

Show beancounters if there is any failcnt values: (it was ok)


[root@vznode ~]# vzubc -w 860

Restarting overloaded VM:


[root@vznode ~]# vzctl restart 860
...
Child 61485 exited with status 7
...

when restart command repeat, PID of process is changing. Tried to do various kill all commands like vzctl exec 860 kill -9 61485 but don't works. "vzctl enter 860" don't works. What worked for me to go around "Child * exited with status 7" issue is accessing HyperVM control panel of host node OpenVZ server, goint to VM and clicking "Recover Corrupted Vps". Within like 3 minutes my websites was working and VM OK. But i dont advice you to do this if you dont have working backup of a VM. (not sure what it can do)

Before recovering VPS as mentioned above, you may try:
1) vzctl chkpnt CTIDHERE --kill
2) vzctl --verbose restart CTIDHERE
above two commands worked to kill VPS and then restart it successfully!!


Next are various errors & actions related to stucked VM

Show all VM processes:


vzps axf 860

returns processes like: http://pastebin.com/xNf5JpV8

Sumary: I cant start/stop/restart VM, im not sure if issue can be fixed without restarting Host OpenVZ server..

So i "reboot" openvz host node server.


After reboot (around 5 minutes), do commands:


tail -f /var/log/vzctl.log (or vzlist)

to see VMs slowly stating one by one. But in my case the bad overloaded VM (CT 860) start was hanged many minutes so i decided to try to stop VM and start it again.


[root@vznode ~]# vzctl stop 860 --fast
Locked by: pid 22573, cmdline /usr/sbin/vzctl start 860 --skip-fsck
Container already locked

so its locked, in use, but hanged by process 22573


[root@vznode ~]# ps aux | grep 22573
root 22573 0.0 0.0 25052 1028 ? S 05:16 0:00 /usr/sbin/vzctl start 860 --skip-fsck
root 33267 0.0 0.0 61276 788 pts/0 R+ 05:24 0:00 grep 22573

kill it:


[root@vznode ~]# kill -9 22573

and delete lock file:


[root@vznode ~]# rm /vz/lock/860.lck

kill some checkpoint:


[root@vznode ~]# vzctl chkpnt 860 --kill
Container is not running

start container


[root@vznode ~]# vzctl start 860
Starting container...
vzquota : (error) can't lock quota file, some quota operations are performing for id 860
vzquota on failed [7]



lets see vzquota processes to see the mentioned operation:


[root@vznode ~]# ps ax | grep vzquota
22575 ? D 0:02 /usr/sbin/vzquota on 860 -b 163840100 -B 163840100 -i 81920100 -I 81920100 -e 0 -n 0 -s 1 -u 10 000
35540 pts/0 S+ 0:00 grep vzquota


kill the pid for 860 vzquota process (22575)


[root@vznode ~]# kill -9 22575

start VM, but it is stuck watiting on:


[root@vznode ~]# vzctl start 860
Starting container...
vzquota : (warning) Incorrect quota shutdown for id 860, recalculating disk usage

So i stop above command by Ctrl+C and did:



[root@vznode ~]# vzquota off 860
vzquota : (error) Quota is not running for id 860
vzquota : (warning) Repairing quota: it was incorrectly marked as running for id 860

start vm:


[root@vznode ~]# vzctl start 860
Starting container...
Container is mounted
Adding IP address(es): mysecondip mymainip
Setting CPU limit: 570
Setting CPU units: 1000
Setting CPUs: 6
Container start in progress...


RUNNING now... good. enter vm:



[root@vznode ~]# vzctl enter 860

works.. but websites running at 860 are down


Flush config server firewall rules:


[root@vznode ~]# csf -F

websites slowly started loading.. good

test if config server firewall have required iptables modules on VM:


[root@vznode ~]# /etc/csf/csftest.pl

if many errors, one may need to run list of "modprobe" commands on host server to enable it, read here (http://internetlifeforum.com/security-protection/1711-csftest-pl-failed-%5Bfatal-error-iptables-unknown-error-required-csf-funct/) or here (http://www.webhostrepo.com/blog/enable-iptables-modules-for-a-vps)


Then in my case WHM hosting control panel shows Trial license (http://internetlifeforum.com/reseller-hosting/1854-whm-shows-trial-license-fix/) so i need to switch IP address in:

vi /etc/sysconfig/network-scripts/ifcfg-venet0:0
vi /etc/sysconfig/network-scripts/ifcfg-venet0:1


and run:

service network restart
/usr/local/cpanel/cpkeyclt


OK, vm temporarilly running until next crash..