Troubleshooting High I/O Wait in Linux

Note: This article is a republish of a 2012 blog post, Medium doesn’t support back dating… Forgive the dated material.

Linux has many tools available for troubleshooting; some are easy to use, some are more advanced. I/O Wait is an issue that requires advanced Linux tools and advanced usage of some of its most basic tools.

I/O Wait is difficult to troubleshoot because, by default, there are plenty of tools to tell you that your system is I/O bound and there is a problem.

But few tools can narrow the problem to a specific process or multiple processes.

Answering whether or not I/O is causing system slowness

We can use several commands to identify whether I/O is causing system slowness, but the easiest is the Unix command top.

From the CPU(s) line, we can see the current CPU usage and what that CPU time is being spent on.

In the example above, we can see our CPU is 96% waiting for I/O access. We can see this via the wa CPU stat. Which the top man page defines as the “Amount of time the CPU has been waiting for I/O to complete.”

Finding which disk is being written to

In the example above, the top command shows I/O Wait from the system as a whole. But it does not tell us what disk is being affected; we will use the iostat command.

The iostat command in the example above will print a report every 2 seconds for 5 intervals; the -x flag tells iostat to print out an extended report.

The 1st report from iostat will print statistics based on the last time the user booted the system; for this reason, in most circumstances, the first report from iostat should be ignored.

Every subsequential report printed will be based on the time since the previous interval.

For example, in our command, we will print a report 5 times; the 2nd report is disk statistics gathered since the 1st run of the report, the 3rd is based on the 2nd and so on.

In the above example, the percent utilized on disk sda is 111.41% this is a good indicator that our problem lies with processes writing to sda.

While the test system in my example only has 1 disk, this type of information is beneficial when the server has multiple disks as this can narrow down the search for which process is utilizing I/O.

Aside from percent utilized, there is a wealth of information in the output of iostat; items such as read and write requests per millisecond (rrqm/s & wrqm/s), reads and writes per second (r/s & w/s) and plenty more.

In our example, our program seems to be read and write heavy this information will be helpful when trying to identify the offending process.

Finding the processes that are causing high I/O

A great tool for identifying processes accessing disk is iotop.

After looking at the output of iotop it is easy to see that the program bonnie++ is causing the most I/O utilization on this machine.

While iotop is a great command and easy to use, it is not installed on all (or the main) Linux distributions by default, and I personally prefer not to rely on commands that are not installed by default.

A systems administrator may find themselves on a system where they cannot install non-default packages until a scheduled time or after a significant process, which may be far too late depending on the issue.

If iotop is not available, the below steps will also allow you to narrow down the offending process or multiple processes.

Process list “state”

The ps command is one many will be familiar with, but few will realize has statistics for memory and CPU usage.

Unfortunately, it does not have a statistic for disk I/O, but it does show the processes state, which we can use to indicate whether or not a process is waiting for I/O.

The ps command state field provides a handy view into what that process is currently doing. To explain this better, let’s break down the possible states according to the ps command’s man page.

  • D: uninterruptible sleep (usually IO)
  • R: running or runnable (on run queue)
  • S: interruptible sleep (waiting for an event to complete)
  • T: stopped, either by a job control signal or because it is being traced
  • W: paging (not valid since the 2.6.xx kernel)
  • X: dead (should never be seen)
  • Z: defunct (“zombie”) process, terminated but not reaped by its parent

Processes that are waiting for I/O are commonly in an “uninterruptible sleep” state or D; given this information, we can easily find the processes that are constantly in a wait state using some command-line Bash scripting.

The above for loop will print the processes in a D state every 5 seconds for 10 intervals.

The output above shows that the bonnie++ process with a PID 16528 is waiting for I/O more often than any other process.

At this point, this process seems likely to be causing the I/O Wait, but just because the process is in an uninterruptible sleep state does not necessarily prove that it is the cause of I/O wait.

To help confirm our suspicions, we can use the /proc file system. Within each processes directory, there is a file called io which holds the same I/O statistics that iotop is utilizing.

The read_bytes and write_bytes are the number of bytes that this specific process has written and read from the storage layer.

In this case, the bonnie++ process has read 46 MB and written 524 MB to disk. While this may not be a lot for some processes, in our example, this is enough writing and reading to cause the high I/O wait that this system is seeing.

Finding what files are being written too heavily

Once the process is identified, or a set of processes are suspected. We can use the lsof command to see what files the specific processes have open. This list of open files will allow us to make an educated guess as to what files are being written to most often.

The below example narrows the lsof output down using the -p <pid> flag to print only files open by the specified process id.

To even further confirm that these files are the ones being written to, we can look at what disk the /tmp filesystem is on.

From the output of the df command above, we can determine that the /tmp filesystem is part of the root logical volume in the workstation volume group.

Using the pvdisplay command, we can see that the /dev/sda5 partition part of the sda disk is the partition that the workstation volume group is using and, in turn, is where /tmp filesystem exists.

Given this information, it is safe to say that the large files listed in the lsof output above is likely the files being read & written too frequently.

Originally published at https://bencane.com.

Distinguished Engineer @AmericanExpress building payments systems. Author: https://amzn.to/3kuCFpz, Thoughts, & Opinions are my own.