<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
<book>
  <title>Netsaint Plug-in Developer Guidelines</title>

  <bookinfo>
    <authorgroup>
      <author>
	<firstname>Karl</firstname>
	<surname>DeBisschop</surname>
	<affiliation>
	  <address><email>karl@debisschop.net</email></address>
	</affiliation>
      </author>

      <author>
	<firstname>Ethan</firstname>
	<surname>Galstad</surname>
	<authorblurb>
	  <para>Author of Netsaint</para>
	  <para><ulink url="http://www.netsaint.org"></ulink></para>
	</authorblurb>
	<affiliation>
	  <address><email>netsaint@linuxbox.com</email></address>
	</affiliation>
      </author>

      <author>
	<firstname>Hugo</firstname>
	<surname>Gayosso</surname>
	<affiliation>
	  <address><email>hgayosso@gnu.org</email></address>
	</affiliation>
      </author>
    </authorgroup>

    <pubdate>2000</pubdate>
    <title>Netsaint plug-in development guidelines</title>
    <revhistory>
       <revision>
          <revnumber>0.3</revnumber>
          <date>22 Jan 2001</date>
       </revision>
    </revhistory>
  </bookinfo>

  <preface>
    <title>About the guidelines</title>

    <para>The purpose of this guidelines is to provide a reference for
    the plug-in developers and encourage the standarization of the
    different kind of plug-ins: C, shell, perl, python, etc.</para>
  </preface>


  <article>
    <simplesect>

      <title>Copyright</title>

        <para>Netsaint Plug-in Development Guidelines Copyright (C) 2000 2001
        Karl DeBisschop, Ethan Galstad, Hugo Gayosso</para>

        <para>Permission is granted to make and distribute verbatim
        copies of this manual provided the copyright notice and this
        permission notice are preserved on all copies.</para>

    </simplesect>

    <simplesect>

      <title>Print something, but keep it short</title>

      <para>You should always print something to STDIO that tells if
      the service is working or why its failing. Try to keep the
      output short - probably less that 80 characters. Remember that
      you ideally would like the entire output to appear in a pager
      message, which will get chopped off after a certain
      length.</para>

    </simplesect>

    <simplesect>

      <title>Print only one line of text</title>

      <para>NetSaint will only grab the first line of text from STDIO
      when it notifies contacts about potential problems. If you print
      multiple lines, you're out of luck. Remember, keep it short and
      to the point.</para>

    </simplesect>

    <simplesect>

      <title>Return the proper status code</title>

      <para>See the <ulink url="t36.html#AEN112">table</ulink> below
      for the numeric values of status codes and their
      description. Remember to return an UNKNOWN state if bogus or
      invalid command line arguments are supplied or it you are unable
      to check the service.</para>

    </simplesect>

    <simplesect>

      <title>Don't execute system commands without specifying their
      full path</title>

      <para>Don't use exec(), popen(), etc. to execute external
      commands without explicity using the full path of the external
      program.</para>

      <para>Doing otherwise makes the plugin vulnerable to hijacking
      by a trojan horse earlier in the search path. See the main
      plugin distribution for examples on how this is done.</para>

    </simplesect>

    <simplesect>

      <title>Use spopen() if external commands must be executed</title>

      <para>If you have to execute external commands from within your
      plugin and you're writing it in C, use the spopen() function
      that Karl DeBisschop has written.</para>

      <para>The code for spopen() and spclose() is included with the
      core plugin distribution.</para>

    </simplesect>

    <simplesect>

      <title>Don't make temp files unless absolutely required</title>

      <para>If temp files are needed, make sure that the plugin will
      fail cleanly if the file can't be written (e.g., too few file
      handles, out of disk space, incorrect permissions, etc.) and
      delete the temp file when processing is complete.</para>

    </simplesect>

    <simplesect>

      <title>Don't be tricked into following symlinks</title>

      <para>If your plugin opens any files, take steps to ensure that
      you are not following a symlink to another location on the
      system.</para>

    </simplesect>

    <simplesect>

      <title>Validate all input</title>

      <para>use routines in utils.c and write more as needed</para>

    </simplesect>

    <simplesect>

      <title>Write changes to configure.in</title>

      <para>to add to the EXTRAS list unless you are fairly sure that
      the plugin will work for all platforms with no non-standard
      software added</para>

    </simplesect>


    <simplesect>

      <title>Screen Output</title>

      <para>The plug-in should print the diagnostic and just the
      synopsis part of the help message.  A well written plugin would
      then have --help as a way to get the verbose help.</para>

      <para>Code and output should try to respect the 80x25 size of a
      crt (remember when fixing stuff in the server room!)</para>

    </simplesect>

    <simplesect>
      <title>Perl Plugins</title>
      <para>Perl scripts should be called with "-wT"</para>
    </simplesect>

    <simplesect>
      <title>Timeouts</title>

      <para>Use DEFAULT_SOCKET_TIMEOUT</para>

      <para>Almost all plugins should use DEFAULT_SOCKET_TIMEOUT to
      timeout</para>

      <para>Add alarms to network plugins</para>

      <para>If you write a plugin which communicates with another
      networked host, you should make sure to set an alarm() in your
      code that prevents the plugin from hanging due to abnormal
      socket closures, etc. NetSaint takes steps to protect itself
      against unruly plugins that timeout, but any plugins you create
      should be well behaved on their own.</para>

      <para>All plugins should timeout gracefully, not just networking
      plugins. For instance, df may lock if you have automounted
      drives and your network fails - but on first glance, who'd think
      df could lock up like that.  Plus, it should just be more error
      resistant to be able to time out rather than consume
      resources.</para>

    </simplesect>

    <simplesect>
      <title>Option Processing</title>

      <para>For plugins written in C, we recommend the C standard
      getopt library for short options. If using getopt_long, check to
      be sure that HAVE_GETOPT_H is defined (configure checks this abd
      sets the #define in common/config.h).</para>

      <para>There are a few reserved options that should not be used
      for other purposes:</para>

      <literallayout>
          -V version (--version)
          -h help (--help)
          -t timeout (--timeout)
          -w warning threshold (--warning)
          -c critical threshold (--critical)
      </literallayout>

      <para>Look at check_pgsql and check_procs to see how I currently
      think this can work.  Standard options are:</para>

      <literallayout>
          -C SNMP community (--community)
          -a authentication password (--authentication)
          -l login name (--logname)
          -p port or password (--port or --passwd/--password)
          -u url or username (--url or --username)
      </literallayout>

      <para>The option -V or --version should be present in all
      plugins and should should result in a call to print_revision, a
      function in utils.c which takes two character arguments, the
      command name and the plugin revision.</para>

      <para>The -? option, or any other unparsable set of options,
      should print out a short usage statement. Character width should
      be 80 and less and no more that 23 lines should be printed (it
      should display cleanly on a dumb terminal in a server
      room).</para>

      <para>The option -h or --help should be present in all plugins
      and should should result in a call to print_help (or
      equivalent).  The function print_help should call
      print_revision, then print_usage, then should provide detailed
      help. Help text should fit on an 80-character width display, but
      may run as many lines as needed.</para>

    </simplesect>

    <simplesect>
      <title>Plugins with more than one type of threshold, or with
      threshold ranges</title>

      <para>Old style was to do things like -ct for critical time and
      -cv for critical value. That goes out the window with POSIX
      getopt. The allowable alternatves are:</para>

      <orderedlist>
	<listitem>
	  <para>long options like -critical-time (or -ct and -cv, I
	  suppose).</para>
	</listitem>

	<listitem>
	  <para>repeated options like `check_load -w 10 -w 6 -w 4 -c
	  16 -c 10 -c 10`</para>
	</listitem>

	<listitem>
	  <para>for brevity, the above can be expressed as `check_load
	  -w 10,6,4 -c 16,10,10`</para>
	</listitem>

	<listitem>
	  <para>ranges are expressed with colons as in `check_procs -C
	  httpd -w 1:20 -c 1:30` which will warn above 20 instances,
	  and critical at 0 and above 30</para>
	</listitem>

	<listitem>
	  <para>lists are expressed with commas, so Jacob's check_nmap
	  uses constructs like '-p 1000,1010,1050:1060,2000'</para>
	</listitem>

	<listitem>
	  <para>If possible when writing lists, use tokens to make the
	  list easy to remember and non-order dependent - so
	  check_disk uses '-c 10000,10%' so that it is clear which is
	  the precentage and which is the KB values (note that due to
	  my own lack of foresight, that used to be '-c 10000:10%' but
	  such constructs should all be changed for consistency,
	  though providing reverse compatibility is fairly
	  easy).</para>
	</listitem>

      </orderedlist>

      <para>As always, comments are welcome - making this consistent
      without a host of long options was quite a hassle, and I would
      suspect that there are flaws in this strategy. Perhaps clear
      long-options is the most important of the above choices, but not
      all POSIX systems have C libraries for long options, so the
      short forms must exist as well.</para>
    </simplesect>

    <simplesect>
      <title>Plugin Return Codes</title>
      <table>
        <title>Plugin Return Codes</title>
        <tgroup cols="3">
	    <thead>
	      <row>
    	      <entry><para>Numeric Value</para></entry>
	      <entry><para>Service Status</para></entry>
	      <entry><para>Status Description</para></entry>
	      </row>
	    </thead>
	  <tbody>
	    <row>
	      <entry align=center><para>0</para></entry>
	      <entry valign=middle><para>OK</para></entry>
	      <entry><para>The plugin was able to check the service and it appeared to be functioning properly</para></entry>
	    </row>
	    <row>
	      <entry align=center><para>1</para></entry>
	      <entry valign=middle><para>Warning</para></entry>
	      <entry><para>The plugin was able to check the service, but it appeared to be above some "warning" threshold or did not appear to be working properly</para></entry>
	    </row>
	    <row>
	      <entry align=center><para>-1</para></entry>
	      <entry valign=middle><para>Unknown</para></entry>
	      <entry><para>Invalid command line arguments were supplied to the plugin or the plugin was unable to check the status of the given hosts/service</para></entry>
	    </row>
	    <row>
	      <entry align=center><para>2</para></entry>
	      <entry valign=middle><para>Critical</para></entry>
	      <entry><para>The plugin detected that either the service was not running or it was above some "critical" threshold</para></entry>
	    </row>
	  </tbody>
        </tgroup>
      </table>
    </simplesect>
  </article>
</book>