Friday, December 12, 2014

ESXi - hp-ams memory leak (Fix critical AMS issue)

A couple of moths or so, we had a strange/critical problem with our hosts(ESXi 5.1) in one our vCenter(v5.1).

All hosts(around 20) start with a strange behaviour. VMs were not able to start/stop or vMotion.
On the hosts we cannot start or stop any service.
When we try to go the ESXi console/shell we get ´cant´t fork´.

Some of the symptoms that can happen:

SYMPTOM:ESXi host hang during the Virtual Machine Vmkotion. Vmware termed this as HP-AMS 9.6 Memory leak issue.
SYMPTOM:Virtual Machine keeps working fine.
SYMPTOM:ESXi host is hung does not take any command either thru "putty" or "iLO-IRC".
SYMPTOM:Unable to fetch VMsuppport logs, as server is hung.
SYMPTOM:Cannot perform a vMotion to and from an ESXi host.
SYMPTOM:Cannot enable services from or to the ESXi host.
SYMPTOM:When attempting to enable services or vMotion the ESXi host fails.
SYMPTOM:When logging in to the ESXi shell, message seen: can't fork       

SYMPTOM:When pressing Alt+F12 at the DCUI, error seen:
WARNING: Heap: 2677: Heap globalCartel-1 already at its maximum size. Cannot expand

After many troubleshooting, and also some help form VMware support(this happen before this was a known and reported issue), we found out that the problem was in the hp-ams(HP Agentless Management Service).

CAUSE:HP-AMS service creates lots of zombie processes in the backgroud, which takes up all the memory and make the host hang(host services or VMs actions).

This is a know issue that can be found in the hp-ams versions 500.9.6.0-12.434156 and 550.9.6.0-12.1198610.
HP states that the problem can be found in ESXi 5.0, 5.1 and also 5.5, from all hp-ams v9.6.x or some 10.0.1.x.

So the option is to update/upgrade the hp-ams to latest hp-ams version(10.0.1-2.x)

First we need to download the proper versions for our HP Server and also for our ESXi version.

Both you can check in HP support site:

Lattest versions.

For ESXi 5.0/5.1 AMS Offline Bundle: Here
Full Bundle: Here
For ESXi 5.5 AMS Offline Bundle: Here
Full Bundle: Here

Note: I recommend that you apply the full bundle so that you can update all your HP vibs

More VMware details in VMware related KB article

After download the proper HP offline Bundle and you have copy to your host, you need to remove the old one and install the new one.

Here is how to:

On your ESXi console run these commands

##check hp-ams version
esxcli software vib list | grep "hp-am*"
##stop hp-ams service
/etc/init.d/ stop

##Remove old hp-ams
esxcli software vib remove -n hp-ams

Note: Even is not mandatory, I always reboot the host before install the new one.

##Install the new hp-ams
example for ESXi 5.0/5.1 full bundle: esxcli software vib install -d /fullpath/ -f

If is just the ams file, you can run the same command just chaning the file line
*fullpath is the vmfs storage and folder that you copy you file in ESXi host.

After the installing you need to reboot you host.

This should fix your ams issues.

Hope this can help.