Understanding Exit Codes and Using them in Bash scripts
--
Lately I’ve been working on a lot of automation and monitoring projects, a big part of these projects are taking existing scripts and modifying them to be useful for automation and monitoring tools. One thing I have noticed is sometimes scripts use exit codes and sometimes they don’t. It seems like exit codes are easy for people to forget, but they are an incredibly important part of any script. Especially if that script is used for the command line.
What are exit codes?
On Unix and Linux systems, programs can pass a value to their parent process while terminating. This value is referred to as an exit code or exit status. On POSIX systems the standard convention is for the program to pass 0
for successful executions and 1
or higher for failed executions.
Why is this important? If you look at exit codes in the context of scripts written to be used for the command line the answer is very simple. Any script that is useful in some fashion will inevitably be either used in another script, or wrapped with a bash one liner. This becomes especially true if the script is used with automation tools like SaltStack or monitoring tools like Nagios, these programs will execute scripts and check the status code to determine whether that script was successful or not.
On top of those reasons, exit codes exist within your scripts even if you don’t define them. By not defining proper exit codes you could be falsely reporting successful executions which can cause issues depending on what the script does.
What happens if I don’t specify an exit code
In Linux any script run from the command line has an exit code. With Bash scripts, if the exit code is not specified in the script itself the exit code used will be the exit code of the last command run. To help explain exit codes a little better we are going to use a quick sample script.
Sample Script:
#!/bin/bash
touch /root/test
echo created file
The above sample script will execute both the touch
command and the echo
command. When we execute this script (as a non-root user) the touch command will fail, ideally since the touch command failed we would want the exit code of the script to indicate failure with an appropriate exit code. To check the exit code we can simply print the $?
special variable in bash. This variable will print the exit code of the last run command.
Execution:
$ ./tmp.sh
touch: cannot touch '/root/test': Permission denied
created file
$ echo $?
0
As you can see after running the ./tmp.sh
command the exit code was 0
which indicates success, even though the touch command failed. The sample script runs two commands touch
and echo
, since we did not specify an exit code the script exits with the exit code of the last run command. In this case, the last run command is the echo
command, which did execute successfully.
Script:
#!/bin/bash
touch /root/test
If we remove the echo
command from the script we should see the exit code of the touch
command.
Execution:
$ ./tmp.sh
touch: cannot touch '/root/test': Permission denied
$ echo $?
1
As you can see, since the last command run was touch
the exit code reflects the true status of the script; failed.
Using exit codes in your bash scripts
While removing the echo
command from our sample script worked to provide an exit code, what happens when we want to perform one action if the touch
was successful and another if it was not. Actions such as printing to stdout
on success and stderr
on failure.
Testing for exit codes
Earlier we used the $?
special variable to print the exit code of the script. We can also use this variable within our script to test if the touch
command was successful or not.
Script:
#!/bin/bash touch /root/test 2> /dev/nullif [ $? -eq 0 ]
then
echo "Successfully created file"
else
echo "Could not create file" >&2
fi
In the above revision of our sample script; if the exit code for touch
is 0
the script will echo
a successful message. If the exit code is anything other than 0
this indicates failure and the script will echo
a failure message to stderr
.
Execution:
$ ./tmp.sh
Could not create file
Providing your own exit code
While the above revision will provide an error message if the touch
command fails, it still provides a 0
exit code indicating success.
$ ./tmp.sh
Could not create file
$ echo $?
0
Since the script failed, it would not be a good idea to pass a successful exit code to any other program executing this script. To add our own exit code to this script, we can simply use the exit
command.
Script:
#!/bin/bashtouch /root/test 2> /dev/null if [ $? -eq 0 ]
then
echo "Successfully created file"
exit 0
else
echo "Could not create file" >&2
exit 1
fi
With the exit
command in this script, we will exit with a successful message and 0
exit code if the touch
command is successful. If the touch
command fails however, we will print a failure message to stderr
and exit with a 1
value which indicates failure.
Execution:
$ ./tmp.sh
Could not create file
$ echo $?
1
Using exit codes on the command line
Now that our script is able to tell both users and programs whether it finished successfully or unsuccessfully we can use this script with other administration tools or simply use it with bash one liners.
Bash One Liner:
$ ./tmp.sh && echo "bam" || (sudo ./tmp.sh && echo "bam" || echo "fail")
Could not create file
Successfully created file
bam
The above grouping of commands use what is called list constructs in bash. List constructs allow you to chain commands together with simple &&
for and and ||
for or conditions. The above command will execute the ./tmp.sh
script, and if the exit code is 0
the command echo "bam"
will be executed. If the exit code of ./tmp.sh
is 1
however, the commands within the parenthesis will be executed next. Within the parenthesis the commands are chained together using the &&
and ||
constructs again.
The list constructs use exit codes to understand whether a command has successfully executed or not. If scripts do not properly use exit codes, any user of those scripts who use more advanced commands such as list constructs will get unexpected results on failures.
More exit codes
The exit
command in bash accepts integers from 0 - 255
, in most cases 0
and 1
will suffice however there are other reserved exit codes that can be used for more specific errors. The Linux Documentation Project has a pretty good table of reserved exit codes and what they are used for.
Originally published at https://bencane.com on September 2, 2014.