In this post, I detail how to setup Nagios to monitor a Windows machine via NSClient++
- A working Nagios install, configured on a server with a fixed IP
- The latest NSClient++ MSI Installer
NOTE: At time of writing, the latest version of NSClient++ is 0.42 and is recommended as 0.41 has issues with binding to IPv4 addresses
Run the installer
Choose your preferred installation type
(Typical should suffice: Custom and Complete allow for additional functionality such as client-side LUA and Python scripting, but that functionality is outside the scope of this post)
Choose to install a sample configuration
(The default is functional and works well for our purposes)
Enter the IP of your nagios server under Allowed Hosts
Enter an NSClient password if desired
Enable common check plugins and nsclient server
Finish the install
Right click on My Computer and select Manage
Go to Service & Application -> Services
Open the start menu
Type 'Service' and select the 'Services' item
Open the start screen
Type 'Service' and select 'View local services'
From the services menu, locate NSClient++
Right click on it and select Properties
Open the Log On tab and check the 'Allow service to interact with desktop' box- this gives NSClient access to the data it will be monitoring
As part of the installation, NSClient should have added the appropriate exceptions to the windows firewall. If you're using a third-party firewall on the client side, you'll need to open port 12489.
First, ensure that NSClient++ runs in test mode- open it from the newly-added start menu shortcut
A command prompt window should appear with NSClient++'s output.
NOTE: On our test machine, version 0.42 had an issue with character encoding which caused some garbage characters to be displayed in the command line, but the service itself ran with no issues.
Check through for any errors pertaining to binding to an IP or listening on a port- if you find any, ensure that no other program is listening on port 12489 and try again. Missing file warnings can be safely ignored.
From a terminal session on your nagios server, find the check_nt binary- common locations are in /usr/local/nagios or /usr/local/nagios/libexec
Run the check_nt binary as follows:
check_nt -H -p 12489 -v MEMUSE
If this outputs memory usage stats, move onto the next section
If it outputs 'Could not fetch information from server', you'll need to double-check that port 12489 is open. This can be done with telnet:
telnet (Client IP) 12489
Nagios 3 comes with some pre-configured example settings for windows servers, located (by default) in /usr/local/nagios/etc/objects/windows.cfg
These will work fine for the purposes of this tutorial, so edit your nagios.cfg (located at /usr/local/nagios/etc/nagios.cfg by default) and uncomment the following line:
Next, open windows.cfg
Modify the sample server definition (windows-server) with your server's host_name, alias and address.
Modify the sample service definitions to use your server's new host_name
In theory, nagios should now be set up to monitor your windows machine's NSClient++ version, uptime, cpu load, memory usage, drive space, explorer.exe status and W3SVC service status. However, we found that this was not the case.
When setting up on our test machine, all of our services would return 'Could not fetch information from server' despite the (successful) testing detailed above. This turned out to be an issue with the command definition for check_nt.
If you open /etc/nagios-plugins/config/nt.cfg, you'll notice that there are two command definitions: check_nt and check_nscp. The reason for the above errors is that the check_nt command definition doesn't pass in a port parameter- not only that, but the check_nt binary uses port 1248 by default! check_nscp on the other hand has the correct definition, so there are a few possible solutions:
- Modify the check_nt command definition to use port 12489
- Modify the service definitions in windows.cfg to use check_nscp
- Explicitly define a custom command for each NSClient++ service
In our test machine's case, we went with option 3. Some of the check_nt commands (MEMUSE, CPULOAD, etc.) take upwards of two parameters, and the definitions for check_nt and check_nscp only pass in one and two parameters respectively. This was causing invalid parameter errors using the existing windows.cfg setup, and adding more $ARG$ entries wasn't working (presumably due to the presence of -w, -c, etc. parameter prefixes).
So to fix this, we ended up using something along the lines of the following:
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v USEDDISKSPACE -l c -w 80 -c 90
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v CPULOAD -l 5,80,90
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v UPTIME
service_description C:\ Disk Space
service_description CPU Load
It is worth noting that you do lose the ability to parametrise your nagios settings per-server with this setup. In hindsight, you could get around this by hardcoding the parameter prefixes into the new command definitions and use Nagios' $ARG1$, $ARG2$ symbols to pass through the actual values. Like so:
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v USEDDISKSPACE -l $ARG1$ -w $ARG2$ -c $ARG3$
service_description C:\ Disk Space
After modifying the command and service definitions accordingly, you should be ready to go. Run
service nagios restart from your server's terminal session, fix any config file errors and then check the web interface. Give it a few minutes for nagios to query the machine for the first time, and your services should start showing up green.
Oh that's brilliant!! I'd been bashing my head against the desk for quite awhile (using a 0.4.1xx client) before I came across this post. Client 0.4.2xxx worked a treat as did the explanation you've given here. Thanks for clarifying my understanding of what's going on with the windows client (and for saving my forehead from further damage!)