โ€น Goodness Duru

IPC - Unix Signals

Oct 05, 2023

In the previous article, we covered Unix Socket and how it can be used for Inter-Process Communication. This article discusses a different and limited form of IPC.

In the IPC mechanisms we’ve looked at and most other mechanisms, when an application process sends a message to another, an action is taken by the receiving process depending on the message received. The message will most likely be a byte or group of bytes. These bytes need to be parsed and checked to determine the appropriate action to take. The action to be taken might be calling a function, or executing a program expression. Sometimes, no action needs to be taken due to the application process receiving a message that’s not intended for it. Yet, the message needs to be parsed to take no action. How about letting the OS do the parsing to determine if our process needs to ignore? Let’s go further; what if we want the OS to execute a function in the application process?

Unix Signal offers this and more. But before we discuss more about it, we need to understand more about Unix processes.

Processes and Process Group

Open your terminal and type this:

ps

The above command will print out all the processes that have controlling terminals. Here’s mine:

PID   TTY        TIME    CMD
10891 ttys044    0:00.74 /bin/zsh -il
15468 ttys056    0:00.33 /bin/zsh -il
16837 ttys056    0:00.00 sleep 2000

The above output lists the pids, terminal type, CPU time, and the process command with its arguments. The pid is what we are mainly concerned about. Whenever we start a process, it is given a unique pid by the OS. Most times, a new pid is more than the previously assigned pid during an OS uptime[1]. In database terminology, pid is akin to the PRIMARY KEY for processes.

Now that we know about pids, let’s talk about process group. According to Wikipedia, a process group is a collection of one or more processes. Every process belongs to a process group, and every process group has an id called pgid. By default, the ps command does not display the pgid of a process. But, we can include it in the ps output by typing the command ps -o "pid,tty,time,command,pgid". Here’s my terminal output:

PID   TTY        TIME    CMD          PGID
10891 ttys044    0:00.87 /bin/zsh -il 10891
15468 ttys056    0:00.41 /bin/zsh -il 15468
19198 ttys057    0:00.24 /bin/zsh -il 19198

The eagle-eyed reader will notice that a process’s pgid is the same as its pid; that’s by design. In most cases when a new process is started, it belongs to its process group of one member (itself). This process group is assigned an id, conveniently the new process’s id. If a new process always creates a new process group with a maximum membership count of one, how can multiple processes belong to one group? Here’s a question. If we start a process, one at a time, and that process has its own group, how do we start multiple processes as a group? The answer is to use a common way to automate system administrative tasks, a shell script. Here’s a very simple one:

    sleep 1000 &
    sleep 500

It’s pretty straightforward; two sleep processes are started concurrently. One sleeps for a thousand seconds, and the other sleeps for five hundred seconds. Here’s the output from my terminal:

PID   TTY        TIME    CMD          PGID
---some processes---
20648 ttys002    0:00.30 /bin/zsh -il 20648
20815 ttys002    0:00.01 sh run.sh    20815
20816 ttys002    0:00.00 sleep 1000   20815
20817 ttys002    0:00.00 sleep 500    20815
---some processes---

You can see that three of the four processes in the output share the same process group. These are the shell process and the two sleep processes. This happened because anytime you run a shell script, it assigns its process group to all its child processes. I think, by now, you have a pretty good idea about process group and how it’s created.

Let’s move on by looking at a fifty-year-old command/system call.

kill

The basic form of the kill command simply does what it means: killing a process. We can kill a process using its pid. Here’s a ps output in my terminal:

PID   TTY        TIME    CMD
20608 ttys000    0:00.19 -zsh
20648 ttys002    0:00.31 /bin/zsh -il
21195 ttys002    0:00.00 sleep 700
20820 ttys004    0:00.22 /bin/zsh -il

Here’s what it outputs after I kill the sleep process by typing kill 21195:

PID   TTY        TIME    CMD
20608 ttys000    0:00.19 -zsh
20648 ttys002    0:00.32 /bin/zsh -il
20820 ttys004    0:00.23 /bin/zsh -il

As you can see, it’s no more; it has been …. killed. The kill command can also kill multiple processes together; all you have to do is list the pids of the processes you want to kill. When you type kill 12983 17838 19983 in your terminal, it kills all the processes whose pids were listed.

In addition to killing multiple processes whose pids are listed, it’s possible to kill all processes in a process group. This can be achieved by setting the pid argument to 0.

The kill command also accepts a number or name prefixed with a hyphen. For now, just think about it as a reason for being killed. Let’s look at some examples containing this prefixed number or name and what some of their effects are. I will sequentially start up multiple sleep programs in a shell script and try to kill them slightly differently in a different terminal, starting from the one with the least amount of seconds. Here’s the shell script:

```sh
    sleep 100
    sleep 200
    sleep 300
    sleep 400
    sleep 500
    sleep 600
```

Here’s the output in the original terminal after I type kill -1 21862

run.sh: line 1: 21862 Hangup: 1               sleep 100

It’s dead, and the next one runs with pid 21880. I’m going to kill it using kill -4 21880. Here’s the output

run.sh: line 2: 21880 Illegal instruction: 4  sleep 200

Dead, unto the next one running with pid 22063. I’m going to kill it using kill -5 22063, and the output is

run.sh: line 3: 22063 Trace/BPT trap: 5       sleep 300

The next one is with pid 22099. I’m killing it with kill -6 22099. The output

run.sh: line 4: 22099 Abort trap: 6           sleep 400

The next one is with pid 22131. I’m killing it with kill -8 22131, and the output is

run.sh: line 5: 22131 Floating point exception: 8   sleep 500

The last one with pid 22163. I’ll just simply run kill 22163. Here’s the output

run.sh: line 6: 22163 Terminated: 15          sleep 600

You can see that for every one of the sleep processes, its reason for being killed is different. Each one of the reasons has two parts: a string and a number. The string output is for human consumption, but the number is more important. If you look carefully at the numbers, you’ll see that they correspond with the prefixed number in our kill command except for the last one. It turns out that when you run kill pid, you’re really running kill -15 pid.

One more thing. You can replace those prefixed numbers with prefixed strings. These strings are unique and are understood by the kill command. Each string value is mapped to a number, so running kill -special_string pid is the same as running kill -special_number pid. For example, kill -fpe pid is the equivalent of kill -8 pid. They will both result in a floating point exception, resulting in the termination of the process.

By now, I bet you’re curious about what those prefixed numbers or strings represent. Don’t worry, I’ve got you :-). They are called signals. Let’s dive into them.

Signals

Signals are standardized messages sent to a process by the Operating System. The list of these messages is very limited in number and are defined in every modern POSIX-compliant system. Some OS might have more and some less, but some universal ones are on all UNIX-based OS. Here’s a list of them and their meaning on Wikipedia.

These messages have a high priority, and thus, the process must be interrupted from its normal flow to handle it. The main reason why they have high priority is because lots of process errors are delivered to a process using signals. For example, you can see from the kill output above that some reasons look like error messages, even though the process didn’t have an error.

Now, I know 2 ways a signal can be sent to a process. They are raise function and kill command/syscall. There might be other ways, but I’m ignorant about them. The raise function is just a process sending a signal to itself. We’ve looked at the kill command, and I’ll talk about its equivalent syscall function soon. The OS kernel may also send a signal by manipulating the process struct directly.

Every signal must have a handler function in a process. This function is executed whenever the process receives the signal. The function can be defined in kernel or user-level code. When the OS starts a new application process, it assigns default handlers for every of its signal objects. Some signals’ default handlers terminate the process. Some other default handlers don’t execute anything, i.e. the signal is ignored. A signal’s default handler can be referenced with the constant SIG_DFL. You can see a list of default signals’ actions here.

A signal default handler can be changed to a different handler. This handler can be a defined function or SIG_IGN. SIG_IGN tells the process to ignore the signal. We set a signal handler by using either the signal() or sigaction functions. We might want to handle a signal once and reset the default handler immediately after. We can do this by setting the signal’s handler function to SIG_DFL inside our defined handler.

Enough talk; let’s demonstrate a simple example. Here’s a simple Python script that executes an infinite loop:

    import signal

    def fpe_bulletproof(signum, frame):
        print("You can't kill me, I'm bulletproof")


    def run():
        signal.signal(signal.SIGFPE, fpe_bulletproof)
        while True: pass

    run()

Run this script in a terminal. Open a second terminal and look up the pid of the script process using ps. Run kill -8 script_pid or kill -fpe script_pid in the second terminal. Now, go to the first terminal running the script process; you should see the “You can’t kill me, I’m bulletproof” printed in the console. What happened is that we’ve replaced the default handler with the fpe_bulletproof function. Every time you run the kill -8 ... command, the handler function is executed, and the statement is printed to your terminal console. Now try to run kill -1 script_pid, you will see that the script has been terminated. This is due to only setting a handler for SIGFPE and not the other signals.

Note that the handler function must have two parameters: the signum and a frame.

The handlers for almost all signals can be changed, except for SIGKILL and SIGSTOP signals. These signals cannot be stopped or ignored. You can try this by changing SIGFPE to SIGKILL and then rerun the script. An “Invalid Argument” exception is thrown. This is why when you want to force-kill a process from the terminal, you type kill -9 process_pid. The 9 represents the SIGKILL signal.

A note of warning, you have to be careful of what is executed inside your handler function. Functions called directly and indirectly by your handler function have to be async-safe. Calling async-unsafe functions in your handler function can invoke undefined behavior. A list of async-safe functions can be found here.

How does this concern IPC?

I know, I know. How does this all concern IPC? Here’s how, whenever the kill pid is run in a terminal, a kill process is started. This process sends a signal to the process whose pid is included in the command argument. If you squint a little, it looks like the kill process sent a message to the other process. Isn’t that Inter-process communication? Can we have the power that kill has?

Luckily for us, we can do what kill does. That’s because the kill process calls the kill() system call. We can send a message to another application process by executing the syscall with its pid and a signal.

Carrying out uni-directional communication from one independent process to another is easy when using kill(). All we have to do is run the recipient process, get its pid, and then run the sending process with the pid. Letting both processes know each other pids requires other IPC mechanisms to communicate their pids. What if we don’t want to do that? What if we want both processes to call the kill function without knowing each other’s pids?

We can answer the above questions with two words “process groups”. We can start our processes with the same process groups and send signals to each other using kill(0, signum). With this, there’s no need for pid exchange, and IPC can be carried out in blissful pids ignorance.

Show me the code

Here’s a demonstration of two Python processes communicating with Signals. Here’s the first:

    import os
    import signal

    i = 0
    ROUNDS = 100

    def print_pong(signum, frame):
        global i
        os.write(1, b"Client: pong\n")
        i += 1


    def run():
        signal.signal(signal.SIGUSR1, print_pong)
        while i < ROUNDS:
            os.kill(0, signal.SIGUSR2)
        os.kill(0, signal.SIGQUIT)

    run()

The above code sets a handler function for the SIGUSR1 signal. This function prints a statement to the console using os.write and increment i. We use write rather than print because print isn’t async-safe. It then sends the SIGUSR2 signal in a while loop. When it receives a hundred SIGUSR1 signals the loop ends, and a SIGQUIT signal is sent.

Here’s the second process:

    import os
    import signal

    should_end = False

    def end(signum, frame):
        global should_end
        os.write(1, b"End\n")
        should_end = True

    def print_ping(signum, frame):
        os.write(1, b"Server: ping\n")
        os.kill(0, signal.SIGUSR1)


    def run():
        signal.signal(signal.SIGUSR2, handler=print_ping)
        signal.signal(signal.SIGQUIT, end)

        while not should_end:
            pass
    
    run()

This one handles 2 signals, SIGUSR2 and SIGQUIT. Each of these signals has its handler function. On receiving a SIGUSR2 signal, the process prints out a statement and sends a SIGUSR signal. When it receives the SIGQUIT signal, the process sets the boolean variable should_end to True. This will end the infinite loop and ensure our program exits.

It is possible to have one handler function for several signals. We can handle each signal differently based on the value of the signum.

You’ll notice that both programs set the pid parameter of os.kill to 0. This works because both processes run in the same process group. Here’s the shell script that is used to run the processes:

    trap "" USR1 USR2 QUIT
    python3 server.py & python3 client.py

We use the trap instruction to handle all the signals sent by both Python processes. This is because kill(0, sig) sends a signal to all the processes in a process group, and the shell process is in the same process group with its default handler(termination). We don’t want that, and that’s why we handle them with an empty statement.

Performance

Signals are plenty fast. Cloudflare benchmarked 404,844 messages per second[2]. That can suit most performance needs.

Demo Code

You can find my code that demonstrates Unix Signals on GitHub.

Conclusion

Unix signals are a straightforward but limited mechanism for IPC. They can do much more than IPC, e.g. set alarms, handling errors. There are some issues in using it, so be cautious.

The next article will cover a mechanism I didn’t know existed until recently called Message Queues. Till then, take care of yourself and stay hydrated! โœŒ๐Ÿพ



๐Ÿท