JVM offers OS-independent I/O interfaces, but I/O activities are closely tied with operating system. In fact, “one of the most important functions performed by operating system is to handle I/O requests and notify processes when their data is ready” (Java NIO). It makes sense to first dive into the underlying basics for a batter understanding of the Java implementation.

Concepts of Non-blocking and Asynchronous I/O

Non-blocking/blocking and asynchronous/synchronous can be interpreted differently based on contexts. I find this book, Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition), provides fairly concise and widely used descriptions.

There are five I/O models under Unix:

  • Blocking I/O
  • Nonblocking I/O
  • Asynchronous I/O (the POSIX aio_ functions)
  • I/O multiplexing (select and poll)
  • signal driven I/O (SIGIO)

I only borrow the first three models from the book to introduce concepts appeared in Java packages:

There are normally two distinct phases for an input operation (and nuances of behaviors in the phases separate five models).

  1. Waiting for the data to be ready. This involves waiting for data to arrive on the network. When the packet arrives, it is copied into a buffer within the kernel.
  2. Copying the data from the kernel to the process. This means copying the (ready) data from the kernel’s buffer into our application buffer.

Blocking I/O Model

The most prevalent model for I/O is the blocking I/O model, on which the java.io package is based.

Blocking I/O model for UDP.
Source: Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

Here, the book uses UDP for this example instead of TCP because with UDP, the concept of data being “ready” to read is simpler than TCP. We can safely use it for understanding the concept.

In the figure, recvfrom (Linux man page) is referred to as a system call to differentiate between application and the kernel. There is normally a switch from running in the application to running in the kernel (context switching). The process calls recvfrom and the system call does not return until the datagram arrives AND is copied into the application buffer, or an error occurs. Therefore, we say that the process is “blocked” the entire time from when it calls recvfrom until it returns. Then, when recvfrom returns successfully, the application processes the datagram. Nothing is paralleled.

Non-blocking I/O Model

When a service is designed to be non-blocking, we are telling the kernel “when an I/O operation that I request cannot be completed without putting the process to sleep, do not put the process to sleep, but return an error instead”.

Non-blocking I/O model.
Source: Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)
  • For the first three recvfrom calls, data is not ready so there is no data to return. The kernel IMMEDIATELY returns an error EWOULDBLOCK.
  • For the fourth time recvfrom is called, a datagram is ready. It is copied into the application buffer and recvfrom returns successfully. The application then processes the data.

Btw, when an application sits in a loop calling recvfrom on a non-blocking system like this, it is called polling. The application is continually polling the kernel to check if some operation is ready. (Often a waste of CPU time, but easy to implement; normally on systems dedicated to a single function).

Asynchronous I/O Model

Asynchronous I/O is defined by the POSIX specification, and various differences in the real-time functions that appeared in the various standards which came together to form the current POSIX specification have been reconciled.

POSIX’s definition of synchronous and asynchronous I/O operations:

  • A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes
  • An asynchronous I/O operation does not cause the requesting process to be blocked

Asynchronous I/O model.
Source: Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

In a word, asynchronous I/O model lets the KERNEL to start the operation and to notify us when the ENTIRE operation is complete, including the copy of the data from the kernel to our buffer. The kernel is like an agent of the user process.

Other Models

I/O multiplexing model is basically select/epoll. With this model, single process/thread can handle I/O’s connected with different networks simultaneously. The select/epoll function continuously asks all sockets to see if anyone’s data is ready, and notify the corresponding user process when ready.

I/O multiplexing model.
Source: Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

Java adopts this model on Selector to control multiple channels with one or few thread(s).

Signal driven I/O model tells the kernel to send us a SIGIO signal when the data is ready.

Signal driven I/O model.
Source: Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

We need to install a signal handler by the sigaction system call. When the data is ready, the kernel generates a SIGIO and sends it to the user process. The signal handler catches the SIGIO and the user process calls recvfrom to retrieve the data from the kernel.

Model Comparison

The differences of the five Unix I/O models are obvious if we consider how they wait for datagram to be ready and what they do when coping data from kernel to application buffer.

Comparison of the five I/O models.
Source: Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

Non-blocking I/O model ensures that the user process is not blocked, not occupied by a single task, but it still needs to check task status and calls recvfrom to copy data to its application buffer. Asynchronous I/O model finds an “agent” for the user process to finish the task, the kernel. The kernel will notify the user process only after the I/O operation is complete.

  • Synchronous I/O: Blocking I/O, non-blocking I/O, I/O multiplexing (selector) and signal driven I/O
  • Asynchronous I/O: is asynchronous I/O alone and is quite different from non-blocking I/O.

Remember the fundamental difference between synchronous and asynchronous I/O models is whether the request process is blocked!

References

  1. 淺談I/O Model
  2. Chapter 6. I/O Multiplexing: The select and poll Functions, Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)
  3. Java 非阻塞 IO 和异步 IO
  4. Java NIO (1st Edition)