|
|
|
|
RoboMon:
System management made easy? Heroix’
RoboMon is a systems management software package that resides on a Windows NT
machine and monitors domains or workgroups for errors, and in many cases,
corrects them transparently. RoboMon has
been around since 1989 and has now reached version 7. Part of the reason RoboMon
works well is that you can establish the rules that govern its behavior, so when
an error such as an application or sharing problem is encountered you have
instructed RoboMon how to construct the correct solution. RoboMon actually does
more than correct commonly encountered errors and monitor an enterprise network,
but those are the uses that most administrators will purchase this package for. RoboMon
runs under Windows NT, UNIX, and OpenVMS. We
tested under Windows NT 4.0 Server and Beta 3 of Windows 2000 Professional (nee
Windows NT 5 Server). The test server was an ALR Revolution 2XL sporting two
266MHz Pentium-IIs, 128 MB RAM, and a DPT SmartRaid IV SCSI controller handling
four 9GB hard drives. A Fast
Ethernet network containing eight Windows NT 4.0 Server application servers, one
Windows NT 4.0 Workstation, and four UNIX workstations (a mix of HP-UX, Solaris,
and AIX) was the test bed. Thirty Windows 95 and Windows 98 clients used the NT Servers
during the test period. (Our version of RoboMon was licensed only for ten servers, hence the rather small
test load.) The Windows and Windows NT machines were configured first as a
workgroup then as a domain with equal results as far as RoboMon was concerned.
The UNIX NFS drives were mounted on four different Windows NT Application
Servers using WRQ’s Reflection NFS Gateway. We used RoboMon on the network for
four weeks, simulating over 8,000 failures of different types using Mercury
Interactive’s WinRunner. In
addition, we had over 350 user-generated errors as well in that period. The
RoboMon package contains the software on a CD-ROM and a thin, spiral bound User
Guide that provides enough information to use the package.
Minimum system requirements for RoboMon are Windows NT 3.51 or 4.0. If
you are running Windows NT 3.51 you must have Service Pack 5 installed. RoboMon
wants at least a 90MHz Pentium II and 32MB RAM minimum, but faster CPUs and more
memory are highly recommended. 25MB
disk space, and a 50MB free page file space is required (we had to create extra
page space for RoboMon because we had frequent virtual memory warnings with our
default setting of 114MB). RoboMon only works with TCP/IP. RoboMon also requires
ODBC 3.5 or higher, and will install it if your system lacks it. Installing
the software is through the Administrator account. A license key is required to install RoboMon unless the
CD-ROM has been hard-coded with an expiry date for an evaluation, as was our
version. During the installation
RoboMon asks for an e-mail account to which all event notices are sent.
We started using our Administrator account for this purpose, but because
of the large number of errors we were forcing on the system to test RoboMon our
mailbox quickly became unwieldy. We
moved to a separate mailbox just for the RoboMon event messages and found that
approach much more workable. Of
course, on a stable network that generates only a few events, any administrator
e-mail account may suffice. The
installation proceeds quickly through the Autorun procedure, with an HTML page
appearing in your default browser to step you through the process.
Five minutes later, you’re done and after a reboot, RoboMon is active. Before
diving into the rules-based capabilities of RoboMon, a few of the ancillary
features are worth noting. First,
the event viewer included with RoboMon provides not only events detected by
RoboMon but also those from the Windows NT Event Viewer. These can be from local and remote events.
The reporting of events is in real time, which can help you head off
trouble before it escalates. Also noteworthy is the system performance
measurement subsystem which shows tables and graphs of system performance and
network events, useful not only for your own readability but also when this
information has to be displayed for non-technical people (when you’re trying
to justify server upgrades, for example!). RoboMon doesn’t have to be managed
from the server. It allows you to
manage all aspects of the software from any machine on the network, which is
handy for placing RoboMon on a server in a secure location then managing it from
your desktop. RoboMon
can monitor a number of sources of information gathered from the server and
machines attached to the network, as well as the network itself.
Monitoring of machines includes their performance information (the
CPU’s usage, available memory, and amount of I/O) and disk usage.
File and directory usage and bottlenecks can be reported.
On the services side, RoboMon watches which services are used and where
any slow-downs occur. Database
usage can be reported for both Oracle and SQL Server. Access to the Internet Information Service and any pass-throughs
to the Internet itself are monitored, as are printer usage and conditions.
All of this information is gathered, in background, from a number of
RoboMon-specific monitoring routines as well as Windows NT’s event logs and
performance counters, as well as ODBC-compliant database reporting tools.
The
administrator’s interface to RoboMon is through the Enterprise Manager window.
This window is similar to the NT Explorer.
The Enterprise Manager starts and stops any of the rules engines, as well
as reports process status and allows changes to a process. Using drag-and-drop
actions, rules on one machine can be propagated to other machines. Built into
the Enterprise Manager are the Rule Designer (used to define rules logic and
actions) and the Solutions Manager (used to modify rule trigger conditions and
actions quickly). The
Enterprise Manager window uses the concept of an anterprise as the lowest common
denominator for the systems and networks to be monitored.
An enterprise may consist of multiple domains or workgroups. The
left-hand pane of the Enterprise Manager shows all the machines in the
enterprise, with machines groupable by domain or workgroup, or by any other
logical groupings you develop – you are not restricted to using the actual
domain or workgroup setups. (The Enterprise name can be renamed to suit your
networks, as can any of the machine groupings.)
Under each particular machine name there are four branches for processes,
used to hold the pre-built rules. (Supplied
default rulesets include Automation, which handles many NT common errors;
Exchange, used for Exchange problems; Performance, which works with the
Performance Monitor; and SQL Server, which works with SQL Server, surprisingly
enough. Beneath each ruleset are further specific rules for each condition to be
monitored.) The contents of one or more of the branches can be modified at will
without affecting other machines on the network. This approach of showing
Enterprise, then domains, machines (which have RoboMon clients running),
processes, and then rules is logical and quickly becomes familiar to
administrators. For
RoboMon to monitor a machine, it has to be sending data to the RoboMon engine.
This is accomplished by using the Enterprise Manager to locate the target
machine, and use a single button-click to add a Statistics Builder process for
that machine. Once statistics are
being recorded for the client, rules can be added.
When RoboMon is first installed, there is only the machine on which
RoboMon is installed showing up in the Enterprise Manager. By adding machines one at a time under real or virtual
domains, the Enterprise can be populated. You
can instruct RoboMon to browse the network for computers running RoboMon clients
already, but this is handy only if the RoboMon server is being changed. The
second RoboMon interface is the Event Monitor that displays real-time
network-wide events from both RoboMon and Windows NT. A set of filtering and
tailoring tools allows you to modify the events that are displayed, as well as
suppress repetitive events and those you know are occurring for which you have
no actions. If you don’t want to use a GUI, RoboMon can be controlled through
a character-based command-line interface instead. This may be handy for rapid
actions conducted remotely over modems, for example. Supporting these two
interfaces are a set of utilities, including graphing and reporting tools, a
statistics routine, and a configuration manager for the event server. The
heart of RoboMon, as mentioned, is the rules engine. Actually, there are several background-based autonomous rules
engines at work, all coupled together by the front-end GUI running in foreground
when demanded by the administrator. The
rules engines are inference engines that can execute commands and processes when
the situation demands. Because of the design of these engines, each engine can
operate separately on different target computers. As far as Windows NT is
concerned, each of the rules engines is a separate process.
Rules set up triggers for particular actions.
These triggers can be anything that Windows NT’s Performance Monitor
(or a client’s) records. Essentially,
if it is measurable on a Performance Monitor, it can be used as a basis for a
rule. Alternatively, rules can be based on file, directory, or disk volume
activity. What
type of thing makes up a rule? Simple
examples are monitoring a printer’s queue.
If it hits a threshold level (which can be computed based on the
printer’s history, not necessarily hard-coded), alerts can be generated for
the administrator or redirection of print requests can occur.
Simple bottleneck reports are easy to generate.
For example, if access to a CD-ROM jukebox becomes a bottleneck, RoboMon
can report the condition. The same applies to database volumes, Internet
gateways, and Remote Access Servers. Rules
can be programmed for near-continuous activity, or instructed to sleep between
checks. For example, when checking printer queues, checking every
five or ten minutes should be enough, instead of chewing up resources
continually checking the queue. RoboMon
includes a set of default rule engines during installation, which are sufficient
for many application problems. Of
course, the real strength of RoboMon is the ability to customize the ruleset to
your requirements. These do not
require programming. Instead, the
Rule Designer graphical interface can be used to develop remarkably
sophisticated action sequences when an engine encounters a trigger condition.
The
easiest way to create new rules is to modify an existing rule.
Use the standard Windows menu choices to copy or edit directly from the
Enterprise window and open the Rule Designer.
There are five page tabs in the Rule Designer that allows for
considerable detail about the rules. The
Documentation tab is used to describe the rule; the Schedule tab determines when
the rule should be executed; the Selections tab specifies what is to be
monitored; the Condition tab sets up a true-false test for the rule to trigger;
and the Actions tab dictates the rules to be applied when the conditions are
evaluated. Several
different actions can be specified when an event triggers the rule.
Notification is the simplest, sending you a message through e-mail,
pager, or SNMP traps. A Corrective
action has RoboMon execute your programmed steps to alleviate the problem.
A Rule Interaction activity can disable, enable, or reschedule other
rules based on one particular event, particularly useful when you have a
cascading series of events that apply more radical steps each time to an
increasing problem. Finally, a
Variable Manipulation allows you to change the value of a RoboMon variable,
affecting other rules dynamically. Rules are set up by specifying each
component, one by one, building up the conditions or actions as needed.
Through the Rules Designer most tasks that you would want RoboMon to
handle can be built without knowing how to program a line of code. The
rules engine in RoboMon is hierarchical, in that you can determine the order of
precedence of actions that the engine uses. This can be used simply to alert you
to potential problems before taking any corrective action, or to follow a number
of more drastic recovery actions in cases of severe problems.
All of the data RoboMon gathers is stored in a Microsoft Access database
unless you instruct it to use another system like SQL Server. While
all the development of rules, managing the Enterprise window, and setting up the
Enterprise itself sounds like a time-consuming and involving task, it isn’t.
In fact, RoboMon is remarkably easy to set up and configure, as well as
manage. On our test network, it took less than half an hour to
populate the Enterprise and set up the default rule sets. The statistics gathering process has no noticeable effect on
the servers, although you will see network traffic increase a little when you
are running real-time condition tests. Since
most rules are set to operate at intervals instead of continually, this isn’t
much of a real problem unless your network is near capacity. Configuring the
SNMP components was the most difficult part of setting up RoboMon, but even so,
a knowledgeable administrator will spend only a few minutes at this task. When
we first installed RoboMon we didn’t tamper with the default rule sets,
finding them to be more the enough for our smallish network’s requirements.
Playing around with some trigger conditions and the rule schedules was
simple and even fun. Daily administration tasks really involved no more than
checking the e-mail account for event notifications, and occasionally browsing
the Event Monitor. We ended up
leaving the Event Monitor running continually in a corner of our network
console. The symbols and colors
used in the Event Monitor quickly draw your eye to alerts and potential
problems, making on-the-fly administration quite easy.
Messages
in the Event Monitor are simple and descriptive, such as “The number of
processes on the system is high”. With most of these types of informative
messages, the rules don’t do anything more than inform you, but that’s
usually enough to let you see how the systems are performing. If this type of
warning persists over time, you know to upgrade the problem system or take some
load off it. Other events can trigger corrective action. For example, when a network has more than one gateway to the
Internet, you can have a rule redirect network users through the second gateway
when the first is loading to capacity. Statistics
are easily gathered from the Event Monitor. We used the statistics to summarize
performance results on our servers, as well as the ODBC accesses.
After stats were gathered, we graphed the results using the supplied
RoboMon Graph routine, providing clear indications of network loads and server
performance. If
you get the impression we liked RoboMon, you’re right.
Network administration monitoring was easier with RoboMon than any other
package we’ve looked at. The rules engine provides so many options for
administrators, that corrective action for most critical problems can be applied
without you getting involved. RoboMon’s been around for many years already,
starting from its VMS roots. This release continues to show the strengths of the package,
and should ensure RoboMon will be around for many more years. RoboMon
V7.0 Summary:
Network-wide monitoring for Windows-based platforms coupled with a clever
rules-based corrective action capability make RoboMon one of the best
administration tools for Windows NT we’ve seen. |
|
Send mail to
tparker@tpci.com with
questions or comments about this web site.
|