UPDATE 1 [2019/05/11]: Thanks to @mirrorbox’s suggestion, I refactored the script to use service status
instead of ps aux | grep
which makes the script even more simple. As a result, the syntax has changed. Since I keep the article untouched, for the updated code visit either the GitHub or GitLab repositories. The new syntax is as follows:
# Syntax
$ /path/to/daemon-keeper.sh
Correct usage:
daemon-keeper.sh -d {daemon} -e {extra daemon to (re)start} [-e {another extra daemon to (re)start}] [... even more -e and extra daemons to (re)start]
# Example
$ /path/to/daemon-keeper.sh -d "clamav-clamd" -e "dovecot"
# Crontab
$ sudo -u root -g wheel crontab -l
# At every minute
* * * * * /usr/local/cron-scripts/daemon-keeper.sh -d "clamav-clamd" -e "dovecot"
UPDATE 2 [2019/05/11]: Another thanks to @mirrorbox for mentioning sysutils/daemontools
which seems a proven solution for restarting a crashing daemon. It makes this hack redundant.
Daemontools is a small set of /very/ useful utilities, from Dan
Bernstein. They are mainly used for controlling processes, and
maintaining logfiles.
WWW: http://cr.yp.to/daemontools.html
UPDATE 3 [2019/05/11]: Thanks to @dlangille for mentioning sysutils/py-supervisor
, which seems to be a viable alternative to sysutils/daemontools
.
Supervisor is a client/server system that allows its users
to monitor and control a number of processes on UNIX-like
operating systems.
It shares some of the same goals of programs like launchd,
daemontools, and runit. Unlike some of these programs, it is
not meant to be run as a substitute for init as "process id 1".
Instead it is meant to be used to control processes related to
a project or a customer, and is meant to start like any
other program at boot time.
WWW: http://supervisord.org/
UPDATE 4 [2019/05/13]: Thanks to @olevole for mentioning sysutils/fsc
. It is minimalistic, dependency free and designed for FreeBSD:
The FreeBSD Services Control software provides service
monitoring, restarting, and event logging for FreeBSD
servers. The core functionality is a daemon (fscd)
which is interfaced with using fscadm. See manual pages
for more information.
UPDATE 5 [2019/05/13]: Thanks to @jcigar for bringing daemon(8) to my attention, which is available in the base system and it seems perfectly capable of doing what I was going to achieve in my script and more.
Amidst all the chaos in the current stage of my life, I don’t know exactly what got into me that I thought it was a good idea to perform a major upgrade on a production FreeBSD server from 11.2-RELENG
to 12.0-RELENG
, when I even did not have enough time to go through /usr/src/UPDATING
thoroughly or consult the Release Notes or the Errata properly; let alone hitting some esoteric changes which technically crippled my mail server, when I realized it has been over a week that I haven’t been receiving any new emails.
At first, I did not take it seriously. Just rebooted the server and prayed to the gods that it won’t happen again. It was a quick fix and it seemed to work. Until after a few days, I noticed that it happened again. This time I prayed to the gods even harder - both the old ones and the new ones ¯\_(ツ)_/¯ - and rebuilt every installed ports all over again in order to make sure I did not miss anything. I went for another reboot and, ops! There it was again laughing at me. Thus, losing all faith in the gods, which led me to take up responsibility and investigate more on this issue or ask the experts on the FreeBSD forums.
After messing around with it, it turned out that the culprit is clamav-clamd
service crashing without any apparent reason at first. I fired up htop
after restarting clamav-clamd
and figured even at idle times it devours around ~ 30%
of the available memory. According to this Stack Exchange answer:
ClamAV holds the search strings using the classic string (Boyer Moore) and regular expression (Aho Corasick) algorithms. Being algorithms from the 1970s they are extemely memory efficient.
The problem is the huge number of virus signatures. This leads to the algorithms’ datastructures growing quite large.
You can’t send those datastructures to swap, as there are no parts of the algorithms’ datastructures accessed less often than other parts. If you do force pages of them to swap disk, then they’ll be referenced moments later and just swap straight back in. (Technically we say “the random access of the datastructure forces the entire datastructure to be in the process’s working set of memory”.)
The datastructures are needed if you are scanning from the command line or scanning from a daemon.
You can’t use just a portion of the virus signatures, as you don’t get to choose which viruses you will be sent, and thus can’t tell which signatures you will need.
I guess due to some arcane changes in 12.0-RELEASE
, FreeBSD kills memory hogs such as clamav-clamd
daemon (don’t take my word for it; it is just a poor man’s guess). I even tried to lower the memory usage without much of a success. At the end, there were not too many choices or workarounds around the corner:
A
. Pray to the gods that it go away by itself, which I deemed impractical
B
. Put aside laziness, and replace security/clamsmtp
with security/amavisd-new
in order to be able to run ClamAV on-demand which has its own pros and cons
C
. Write a quick POSIX-shell script to scan for a running clamav-clamd
process using ps aux | grep clamd
, set it up as a cron job with X-minute(s) interval, and then start the server if it cannot be found running, and be done with it for the time being.
For the sake of slothfulness, I opted to go with option C
. As a consequence, I came up with a generic simple script that is able to not only monitor and restart the clamav-clamd
service but also is able to keep any other crashing services running on FreeBSD.
Requirements and Dependencies
Taking a look at the source code reveals the necessitas for running the script successfully:
readonly BASENAME="basename"
readonly CUT="/usr/bin/cut"
readonly ECHO="echo -e"
readonly GREP="/usr/bin/grep"
readonly LOGGER="/usr/bin/logger"
readonly PS="/bin/ps"
readonly REV="/usr/bin/rev"
readonly SERVICE="/usr/sbin/service"
readonly TR="/usr/bin/tr"
All the dependencies in this list are either internal shell commands or are already present in the FreeBSD base system. So, for running the script, nothing extra is required.
Furthermore, I did not want to rely anything more than standard POSIX shell for such a simple task, despite the fact that I prefer Bash over anything else for more complex tasks ([OmniBackup: One Script to back them all up](OmniBackup: One Script to back them all up) available through FreeBSD Ports as sysutils/omnibackup; or, Reddit wallpaper downloader script).
Usage Syntax
Before running the script, please note that it must has the executable permission set on it. If not, in order to grant executable permission for all users:
$ chmod a+x /path/to/daemon-keeper.sh
Or, only the current user (the user who owns the file):
$ chmod u+x /path/to/daemon-keeper.sh
Or, the users under the group who owns the file:
$ chmod g+x /path/to/daemon-keeper.sh
Getting away from the basics, one can simply run the script by issuing the following command and it outputs the correct usage syntax for you:
Correct usage:
daemon-keeper.sh -e {executable full path} -s {service name to (re)start} [-s {another service name to (re)start}] [... even more -s and service names to (re)start]
Here is the detailed explanation for the available options:
-e
: Expects the executable’s full path. For example in my case,clamav-clamd
service, which is located at/usr/local/etc/rc.d/clamav-clamd
, the executable path is/usr/local/sbin/clamd
. “How do I know the name and path to the underlying executable?”, you may ask. Well, then answer is, by taking a look at the/usr/local/etc/rc.d/clamav-clamd
content:
command=/usr/local/sbin/clamd
-s
: The service name to restart in case of a possible crash. Hmm, why passing more than one service name by specifying-s
is allowed? Very good question indeed. Sometimes you may be required to restart multiple services in case of a crash. For me, I had to also restart thedovecot
service in addition toclamav-clamd
service; if not, my mail server refused to receive any new emails even after starting theclamav-clamd
service. The solution was to restartdovecot
after starting up the crashedclamav-clamd
service.
For the convenience of description, the following example is enough to take care of my mail server (monitoring the clamav-clamd
service and watching out for crashes; then restarting the clamav-clamd
and dovecot
services if a crash happens):
$ /usr/local/cron-scripts/daemon-keeper.sh \
-e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"
[WARNING] '/usr/local/sbin/clamd' is not running!
[INFO] Stopping the service 'clamav-clamd'...
[ERROR] Failed to stop the 'clamav-clamd' service!
[INFO] Starting the service 'clamav-clamd'...
[INFO] The 'clamav-clamd' service has been started successfully!
[INFO] Stopping the service 'dovecot'...
[INFO] The 'dovecot' service has been stopped successfully!
[INFO] Starting the service 'dovecot'...
[INFO] The 'dovecot' service has been started successfully!
$ /usr/local/cron-scripts/daemon-keeper.sh \
-e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"
[INFO] '/usr/local/sbin/clamd' is running!
[INFO] No action is required!
Running through a Cron Job
I have already wrote a guide on how to properly add a cron job on *nix systems, so I won’t go through this in details. Fire up the root’s crontab
file in your favorite editor by issuing:
$ sudo -u root -g wheel -H crontab -e
I prefer to detect a crash as immediately as possible and then restart the service instantaneously. Therefore, I am running the script at a 1
minute interval:
# At every minute
* * * * * /usr/local/cron-scripts/daemon-keeper.sh -e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"
If you are not familiar with the crontab
syntax, crontab.guru is a great visual aid.
On another note, due to the fact that this script is designed to run as a cron job, in addition to stdout
and stderr
, the scripts logs are getting passed through to the system’s log file. On FreeBSD this file is located at /var/log/messages
. This portion of the system’s log output is the result of the script running as a cron job:
$ tail -f /var/log/messages
May 9 20:49:00 3rr0r DAEMON-KEEPER[75509]: WARNING '/usr/local/sbin/clamd' is not running!
May 9 20:49:00 3rr0r DAEMON-KEEPER[78503]: INFO Stopping the service 'clamav-clamd'...
May 9 20:49:00 3rr0r DAEMON-KEEPER[11358]: ERROR Failed to stop the 'clamav-clamd' service!
May 9 20:49:00 3rr0r DAEMON-KEEPER[13204]: INFO Starting the service 'clamav-clamd'...
May 9 20:49:58 3rr0r DAEMON-KEEPER[34208]: INFO The 'clamav-clamd' service has been started successfully!
May 9 20:49:58 3rr0r DAEMON-KEEPER[36552]: INFO Stopping the service 'dovecot'...
May 9 20:49:59 3rr0r DAEMON-KEEPER[32672]: INFO The 'dovecot' service has been stopped successfully!
May 9 20:49:59 3rr0r DAEMON-KEEPER[36849]: INFO Starting the service 'dovecot'...
May 9 20:49:59 3rr0r DAEMON-KEEPER[2973]: INFO The 'dovecot' service has been started successfully!
May 9 20:50:00 3rr0r DAEMON-KEEPER[89081]: INFO '/usr/local/sbin/clamd' is running!
May 9 20:50:00 3rr0r DAEMON-KEEPER[94832]: INFO No action is required!
How it Works
Do not let ~200
lines of shell script code fool you. In fact, there is only one line of code in the script (broken into multiple lines for the purpose of readability) that does all the work:
readonly DAEMON_PROCESS_COUNT=$(${PS} aux \
| ${GREP} -v "${GREP}" \
| ${GREP} -v "${SCRIPT}" \
| ${GREP} -c "${DAEMON}")
Technically. what it does is listing to all the running processes from all users on the system, then looking for the target daemon, it leaves out all the other processes, afterwards counting the number of running processes. If the daemon is not running, then the process count is simply zero. As simple as that.
Leaving out the ${GREP} -v “${SCRIPT}"
part (we will be attending to this one in a moment) and the variable assignment, it will basically gets translated to something similar to:
$ ps aux | grep -v grep | grep -c /usr/local/sbin/clamd
If clamd
is running, the result of running the above command would be a number bigger than zero; otherwise, it would be zero. Well, let’s break it down brick by brick:
$ ps aux
# SORRY!
# I WON'T BE SHARING THE OUTPUT OF THIS COMMAND AS IT IS TOO DANGEROUS TO BE
# SHARED, SINCE ONE CAN GET TO KNOW WHAT EXACTLY I AM RUNNING ON THIS SERVER FOR
# A POTENTIAL EXPLOIT.
# INSTEAD, IF YOU WOULD LIKE TO, YOU CAN RUN IT ON YOUR OWN *NIX DISTRO, AND SEE
# FOR YOURSELF WHAT IT ACTUALLY DOES.
What does ps aux
is essentially doing is showing all the processes for all users (for our purpose the ax
flags would suffice and the u
can be omitted, nonetheless as a habit I keep it). Please consult the ps man page for more information.
Now try the following:
$ ps aux | grep /usr/local/sbin/clamd
clam 26199 0.0 23.7 747944 323252 - Is 20:51 0:00.58 /usr/local/sbin/clamd
root 34001 0.0 0.2 11492 2768 0 S+ 22:27 0:00.00 grep /usr/local/sbin/clamd
In case clamd
is running, it returns the above results. If clamd
is not running (e.g. crashed or has not been started yet):
$ ps aux | grep /usr/local/sbin/clamd
root 34001 0.0 0.2 11492 2768 0 S+ 22:27 0:00.00 grep /usr/local/sbin/clamd
So, the grep
command will always gets counted as one line since it is a running process at the moment the output of ps aux
from the left side of pipe is getting piped to the second part of the command. Using one more pipe we try to eliminate any grep
processes from the results before feeding the output to the last grep
. This is what grep -v grep
does in the following command. So, if it finds the clamd
process it returns the following output, or else nothing at all (which signifies the daemon is not running):
$ ps aux | grep -v grep | grep /usr/local/sbin/clamd
clam 26199 0.0 23.7 747944 323252 - Is 20:51 0:00.58 /usr/local/sbin/clamd
As a final note, when we run the script as a cron job, there is one more thing that has to be taken care of. The ${GREP} -v “${SCRIPT}"
part. Remember the cron job from the previous section?
# At every minute
* * * * * /usr/local/cron-scripts/daemon-keeper.sh -e "/usr/local/sbin/clamd" -s "clamav-clamd" -s "dovecot"
When we run the script from a cron job, if you haven’t noticed by now, we pass /usr/local/sbin/clamd
to the script and it is considered a running process when the output is caught by grep
, always adding one more line to the output. So we have to eliminate this one, too; or the script thinks the process is running due to the count being at least 1
all the times:
$ ps aux \
| grep -v "grep" \
| grep -v "/usr/local/cron-scripts/daemon-keeper.sh" \
| grep "/usr/local/sbin/clamd"
In order to count the number of running process of the daemon (yes, it is possible for a daemon to spawn more processes than one), the last thing we have to do is passing -c
argument to the last grep
command:
$ ps aux \
| grep -v "grep" \
| grep -v "/usr/local/cron-scripts/daemon-keeper.sh" \
| grep -c "/usr/local/sbin/clamd"
Which either returns 0
if the daemon is not running, or any number >0
if the daemon is already running.
Obtaining the Source Code
The source code is available on both GitHub and GitLab for the sake of convenience. In order to download the source code using fetch
, curl
, aria2
, wget
directly:
# GitHub
$ curl -fLo /path/to/daemon-keeper.sh \
--create-dirs \
https://raw.githubusercontent.com/NuLL3rr0r/freebsd-daemon-keeper/master/daemon-keeper.sh
# GitLab
$ curl -fLo /path/to/daemon-keeper.sh\
--create-dirs \
https://gitlab.com/NuLL3rr0r/freebsd-daemon-keeper/raw/master/daemon-keeper.sh
It is also possible to obtain the whole repository by cloning it from git
:
# GitHub
$ git clone \
https://github.com/NuLL3rr0r/freebsd-daemon-keeper.git \
/path/to/clone/freebsd-daemon-keeper
# GitLab
$ git clone \
https://gitlab.com/NuLL3rr0r/freebsd-daemon-keeper.git \
/path/to/clone/freebsd-daemon-keeper
Alternatively, it can be copy-pasted directly from here, which is strongly discouraged due to Pastejacking Exploitation Technique:
|
|
See also
- How to disable HP Proliant ML350p Gen8 P420i RAID controller, enable HBA mode (a.k.a. pass-through), and perform a FreeBSD root on ZFS installation
- Host Unreal Engine 4 projects on Microsoft Azure DevOPS with unlimited cost-free Git LFS quota
- Gregorian / Jalali (a.k.a. Persian Calendar) Date Conversion in C++ using boost::locale
- A workaround for udevd 100% CPU usage and blank screen freeze on Gentoo GNU/Linux with recent NVIDIA drivers