Tuesday, March 5, 2013

/var File System is Full

Do the following:

1) Check for large files over 5 meg in the file system in our case (var):
find /var -type f -xdev -size +5000000c -exec ls -ld {} \;|sort -k 5n

The xdev option tells it not to cross mount points. Since var is the affected file system you wouldn't want to know about large files that may be in /var/spool/sw if you had created a /var/spool/sw file system since it wouldn't be relevant to filling up /var. This is necessary when looking at any file system, like a root file system that is full for the same reason... a large file in var in this case wouldn't be relevant.

The exec option instructs the find command to issue an ls -ld against any file that meets our criteria. The resulting matches are substituted inside the {}, and then executed.

Then the results of the ls -ld output is sent to the sort and sorted based on the 5th column which is the size column. The result is that the larger files are listed at the end.

2) Look at the output and decide if any of the resulting files can be safely removed.

Core Files: Core files result when applications terminate unexpectedly.  They create a core file so that a programmer can analyze them to determine the cause of the abrupt or unexpected termination. These files can be safely removed once verified that they are indeed core files of this nature. They can be quite large in size. Use the unix "file" command to determine if the file is a "valid" core. Example:

file /var/core
/var/core: core file from 'swinstall' - received SIGSEGV

Note: Keep in mind that not all valid core files will generate the above output, but if you see this type of output you can be sure it is a core file that can safely be removed. Users and programmers should not create files or directories named core, but both sometimes incorrectly will do
just this. I mention this so you do not accidentally remove a valid user or program file. Core files can be controlled with the ulimit command. Do a man on ksh and search with /ulimit to jump to the ulimit section.

This is an example of a valid core file which can be removed. You can see that the core file resulted from the swinstall command having received a kill signal of SIGSEGV. Additional information can be obtained from running the what command against the core file:

what /var/core

I will not go into the analysis of the output here because it is a separate topic, but mention it in case you were not aware of this.

Logfiles... see item 3

Patches
If patches seem to be consuming a large amount of space in /var then consider running the cleanup command. I would first install any SD-UX (Software Distributor for Unix) cumulative patches or SD tools patches.
cleanup -c 2

Note: Keep in mind that once the cleanup command has ran removing the patch uninstall information you can not remove the patch. However, it is typically safe to remove a patch that has been superseded twice.

3) Next check for a large number of smaller files that may have passed under the radar of our find command.

Typical reasons for /var to fill up would be:
Print jobs:
lpstat -t
Solution: cancel

syslog.log large (please read the note below before running):
cd /var/adm/syslog ; ls -ld *
> syslog.log

Note: This will remove syslog and mail log info. It may be advised to copy this elsewhere in case you are having another problem. For example the syslog filling up may be due to a hardware problem, or a correctable software configuration issue. cat the file rather than using vi as this will attempt to open it in /var (which in our case is already full)

mail.log large (please read the note below before running):
cd /var/adm/syslog; ls -ld *
> mail.log

If mailq reports numerous mail files, then check the mail log with the mtail command. The mailq is located in /var/spool/mqueue. Correct any issue with mail delivery.

Note: Please read the mail.log prior to removing it. It may indicate a problem that needs to be corrected first. You can use the mtail command to view it, and mailq to see what is queued. cat the file rather than using vi as this will attempt to open it in /var (which in our case is already full)

3) The file may have been removed while a process had the file open. This would remove the file, but not release the space. In this case I would recommend downloading the lsof (LiSt of Open Files) command. It is not installed by default, but can be obtained from the purdue site.

ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/binaries/hpux

I recommend placing this in the /usr/contrib/bin/ directory. Be sure to select the one for your OS, and kernel width. getconf KERNEL_BITS will tell you your OS Kernel width.

The man page will be lsof.8 Place this file in /usr/contrib/man/man8
The man8 directory will need to be created.

lsof /var |sort -k 7n

The second column will give the PID. For example if you see a /dev/vg00/lvol8 listed in the last column and PID reflects that it is
syslogd, then someone tried to do an rm against the syslog.log file while the syslogd daemon still had the file open. To release the space they should have ran:

cat /dev/null > /var/adm/syslog/syslog.log
Or
> /var/adm/syslog/syslog.log

You could use rm, but only if you stopped the syslogd daemon first.

To recover the space... run /sbin/init.d/syslogd stop

If the process is an oracle log file, and someone removed it while it was open, then you would have to stop/start the oracle process to release the desired space. If the process is an essential process you may not want to stop/start it. In this case focus on other means... like removing large syslog, and mail.log or running the cleanup until which time the critical process can be restarted.