Wednesday, March 25, 2015

Reading the FreeBSD Periodic Reports

FreeBSD is a free open source operating system that predates Linux. It is used globally to this day; mostly as a server and in a few appliances. The FreeBSD project is very active and focuses on stability and reliability of the system. You can learn more about FreeBSD from the FreeBSD Foundation website.

One of the many features that FreeBSD includes in the default installation is a set of maintenance and monitoring scripts that run periodically. Unfortunately, many administrators and hobbyists don't fully understand these reports and some aren't even aware of their existence. The reports are mailed to the local root user of the system and if the email subsystem on the server hasn't been configured they sit in the local mailbox.

The reports include:

  • A daily report showing general system health.
  • A security report that runs daily and highlights potential security concerns.
  • A weekly report of system health activities.
  • A monthly report of system login accounting.

These reports offer the system administrator a quick and easy way to monitor their systems with a quick daily glance at a few e-mails (per system). Obviously if you manage hundreds of servers you will want a more robust solution. I will not cover such options in this post.

The Daily Report


This is an e-mail with the subject line: Hostname daily run output (Hostname is the actual short name for the server). Here is an example:
Removing stale files from /var/preserve:
Cleaning out old system announcements: 
Removing stale files from /var/rwho: 
Backup passwd and group files: 
Verifying group file syntax: 
/etc/group is fine
Backing up mail aliases:
Backing up package db directory: 
Disk status:
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ada0p2     15G    3.3G     11G    23%    /
devfs          1.0k    1.0k      0B   100%    /dev
/dev/ada0p5    254G     79M    233G     0%    /data
fdescfs        1.0k    1.0k      0B   100%    /dev/fd
 
Network interface status:
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll Drop
re0    1500 <Link#1>      00:01:2e:bc:bc:6e   239879     0     0    25803     0     0    0
re0    1500 192.168.89.0  caliban             190041     -     -    19392     -     -    -
re0    1500 fe80::201:2ef fe80::201:2eff:fe        0     -     -        2     -     -    -
ath0*  2290 <Link#2>      e0:b9:a5:66:0c:80        0     0     0        0     0     0    0
usbus     0 <Link#3>                               0     0     0        0     0     0    0
usbus     0 <Link#4>                               0     0     0        0     0     0    0
usbus     0 <Link#5>                               0     0     0        0     0     0    0
usbus     0 <Link#6>                               0     0     0        0     0     0    0
plip0  1500 <Link#7>                               0     0     0        0     0     0    0
lo0   16384 <Link#8>                           75644     0     0    75644     0     0    0
lo0   16384 localhost     ::1                  75580     -     -    75580     -     -    -
lo0   16384 fe80::1%lo0   fe80::1                  0     -     -        0     -     -    -
lo0   16384 your-net      localhost               64     -     -       64     -     -    -  
Local system status:
3:01AM  up 4 days, 16:58, 0 users, load averages: 0.34, 0.08, 0.03 
Mail in local queue:
mailq: Mail queue is empty 
Mail in submit queue:
mailq: Mail queue is empty 
Security check:
   (output mailed separately)
Checking for rejected mail hosts: 
Checking for denied zone transfers (AXFR and IXFR): 
Backing up pkgng database: 
-- End of daily output --


Removing stale files from /var/preserve

The first section should always be blank. If anything is found in this section, it indicates that something has gone wrong. Most likely a daemon did not properly start on the last reboot.

It is the output of files that were found in /var/preserve and deleted as part of the job. The /var/preserve directory is intended as a place to save state data between reboots. Normally it would only be used by the operating system. A daemon will write a file to that directory just prior to shutdown and expects to read that same file when the daemon starts up after a reboot. The daemon is supposed to erase the file once read.


Cleaning out old system announcements

This section should also always be blank. If you aren't sure what system announcements are, odds are pretty good you don't use them and there wouldn't be any issues. If you do see something here and you aren't expecting it, investigation will be warranted.

System announcements are an antiquated method of sending a global message to end-users so that they will see it the next time they log-in to the system. It works through the mail sub-system. Announcements are displayed to the user when they log-in. The end-user will see each announcement only once. Since we rarely have end-users on UNIX systems that login to a interactive text shell, this system is rarely used nowadays.


Removing stale files from /var/rwho

As with the previous entries, this should be empty. If there are entries in this section, it indicates something has gone wrong with the rwho subsystem. Generally speaking nobody should be running the rwho subsystem anymore (there may be exceptions).

The rwho subsystem was used back in the days of interactive shells to query information about other users on other computers. You might think of it a little like Facebook from the very early days of the Internet.


Backup passwd and group files

This should be empty, too. If it isn't empty, it is worth investigating what happened.

The passwd and group files store the local user database and the local group (roles) database. If either one gets corrupted or changed unexpectedly, you can use the backup to restore the previous day's version of the files. So, if something goes wrong with the backup it is important to figure out what and fix it.


Verifying group file syntax

If this says anything other than "/etc/group is fine" it means that the group database is corrupt and needs fixing. The easiest fix is probably to restore the previous day's group file but odds are that a system administrator made changes to the file and messed up the syntax somehow. So, it may be worthwhile looking at the difference between the two and making appropriate corrections.


Backing up mail aliases

Similar to the passwd and group backup, this section should be blank. If it is not, the e-mail subsystem will probably not be functioning correctly and e-mail may get lost or misdirected.

The mail aliases file (/etc/aliases) is used to alter where mail for local users is sent. If something goes wrong, the easies fix is to copy the previous day's aliases file back into /etc.


Backing up package db directory

As with passwd, group, and aliases this should be blank. If it is not, something has gone wrong with the backup of the package subsystem.

The FreeBSD package subsystem is used for installing 3rd party software with binary distributions. The distribution repository is maintained by the FreeBSD foundation and updated regularly. The package system (man pkg) provides functionality for installation, upgrades, and removal of 3rd party software.


Disk status

This provides a summary of the mounted filesystems (the output of df -k). It is intended that the administrator glance at this to watch for unexpected changes such as missing filesystems or full filesystems.


Network interface status

This is the output of the netstat command. It is intended that the administrator glance at this to watch for unexpected changes in network status such as missing network interfaces or unexpected networks appearing.

Excessive (or in some cases any) changes to the numbers in Ierrs, Idrop, Oerrs, Coll, and Drop indicate network issues that may need to be addressed.


Local system status

This is the output of the uptime command. It shows you how long the system has been running since the last reboot.

Obviously if the system rebooted since the last report and the administrator wasn't expecting it, there may be a problem. The last 3 numbers are the system load (over 3 different time periods). Since the periodic script runs in the middle of the night, one would expect these to be pretty low. A value of 1 indicates the system is fully loaded in some way.


Mail in local queue

This should generally say "mailq: Mail queue is empty". If it does not, there is something preventing mail from being delivered locally and this should be examined.

It provides a count of the number of messages waiting in the sendmail subsystem queue for local mail delivery. That is mail that is being sent to a user on this system.


Mail in submit queue

If you are running a mail server, this will contain the number of e-mail messages that are queued up for delivery to other systems. It may not be empty in such cases. A large number would indicate a problem but that may or may not be a problem local to this system.

If you are not running a mail server, it may still contain the number of e-mail messages that are queued up for delivery to other systems (but presumably generated from this system). Odds are any value other than "mailq: Mail queue is empty" indicate some sort of a problem with delivery.

Note that these two preceding sections (Mail in local queue and Mail in submit queue) are written assuming that the sendmail daemon is used for the servers e-mail subsystem. Many alternatives provide work-a-like functionality and will result in this report being accurate (both Exim and Postfix do this). Some lightweight alternatives do not provide this functionality and it will be the administrator's responsibility to provide an alternate monitoring solution.


Security check

This only ever says "(output mailed separately)". The output will be found in the mail message with subject line "Hostname security run output" where "Hostname" is the name of the server that ran the script. That report will be covered in a later journal entry.


Checking for rejected mail hosts

If this is a mail server, there may be entries in this section. Excessive entries may indicate a problem with your setup that is causing other organizations mail servers to reject mail from your server.

This will be a list of e-mail that failed to be delivered because the receipent's e-mail server rejected the message. On mail servers, it is not unusual for there to be some entries here.


Checking for denied zone transfers

If this section is not empty, there is a problem with your network's DNS setup and investigation is warranted. Or, someone is trying to take a copy of your DNS Server's entire database without your authorization.

This is an extract of log messages from the DNS subsystem (usually bind) which indicate attempted zone transfers failed. There are several reasons why this could happen.

A zone transfer request from an unauthorized (and unexpected) source may be an early indication of a focused attempt to crack your network but it can also be simple casual curiosity. While it is important to not over-react it would be a good idea to pay attention to other sources of information about unexpected network activity.


Backing up pkgng database

The pkgng subsystem is a replacement for the package subsystem (discussed above). All the same comments from the earlier section apply here as well. As of version 9.3 of FreeBSD (possibly earlier and later as well) either may exist on the server but both should not be used at the same time.

Conclusion

That covers the content of the daily report. Regularly viewing and understanding this report can help an administrator catch problems before they are noticed by end-users.

I have seen people in corporate environments use the output of these reports to satisfy IT audit requirements and as evidence to support the need for more staff or to justify a better annual review with the boss. They provide not only the information needed to be a pro-active system administrator but also the evidence that you are being pro-active and stopping problems before people notice.

No comments:

Post a Comment