UPDATE 1 [2019/05/11]: Thanks to @mirrorbox’s suggestion, I refactored the script to use service status
instead of ps aux | grep
which makes the script even more simple. As a result, the syntax has changed. Since I keep the article untouched, for the updated code visit either the GitHub or GitLab repositories. The new syntax is as follows:
# Syntax
$ /path/to/daemon-keeper.sh
Correct usage:
daemon-keeper.sh -d {daemon} -e {extra daemon to (re)start} [-e {another extra daemon to (re)start}] [... even more -e and extra daemons to (re)start]
# Example
$ /path/to/daemon-keeper.sh -d "clamav-clamd" -e "dovecot"
# Crontab
$ sudo -u root -g wheel crontab -l
# At every minute
* * * * * /usr/local/cron-scripts/daemon-keeper.sh -d "clamav-clamd" -e "dovecot"
UPDATE 2 [2019/05/11]: Another thanks to @mirrorbox for mentioning sysutils/daemontools
which seems a proven solution for restarting a crashing daemon. It makes this hack redundant.
Daemontools is a small set of /very/ useful utilities, from Dan
Bernstein. They are mainly used for controlling processes, and
maintaining logfiles.
WWW: http://cr.yp.to/daemontools.html
UPDATE 3 [2019/05/11]: Thanks to @dlangille for mentioning sysutils/py-supervisor
, which seems to be a viable alternative to sysutils/daemontools
.
Supervisor is a client/server system that allows its users
to monitor and control a number of processes on UNIX-like
operating systems.
It shares some of the same goals of programs like launchd,
daemontools, and runit. Unlike some of these programs, it is
not meant to be run as a substitute for init as "process id 1".
Instead it is meant to be used to control processes related to
a project or a customer, and is meant to start like any
other program at boot time.
WWW: http://supervisord.org/
UPDATE 4 [2019/05/13]: Thanks to @olevole for mentioning sysutils/fsc
. It is minimalistic, dependency free and designed for FreeBSD:
The FreeBSD Services Control software provides service
monitoring, restarting, and event logging for FreeBSD
servers. The core functionality is a daemon (fscd)
which is interfaced with using fscadm. See manual pages
for more information.
UPDATE 5 [2019/05/13]: Thanks to @jcigar for bringing daemon(8) to my attention, which is available in the base system and it seems perfectly capable of doing what I was going to achieve in my script and more.
Amidst all the chaos in the current stage of my life, I don’t know exactly what got into me that I thought it was a good idea to perform a major upgrade on a production FreeBSD server from 11.2-RELENG
to 12.0-RELENG
, when I even did not have enough time to go through /usr/src/UPDATING
thoroughly or consult the Release Notes or the Errata properly; let alone hitting some esoteric changes which technically crippled my mail server, when I realized it has been over a week that I haven’t been receiving any new emails.
At first, I did not take it seriously. Just rebooted the server and prayed to the gods that it won’t happen again. It was a quick fix and it seemed to work. Until after a few days, I noticed that it happened again. This time I prayed to the gods even harder - both the old ones and the new ones ¯\_(ツ)_/¯ - and rebuilt every installed ports all over again in order to make sure I did not miss anything. I went for another reboot and, ops! There it was again laughing at me. Thus, losing all faith in the gods, which led me to take up responsibility and investigate more on this issue or ask the experts on the FreeBSD forums.
After messing around with it, it turned out that the culprit is clamav-clamd
service crashing without any apparent reason at first. I fired up htop
after restarting clamav-clamd
and figured even at idle times it devours around ~ 30%
of the available memory. According to this Stack Exchange answer:
ClamAV holds the search strings using the classic string (Boyer Moore) and regular expression (Aho Corasick) algorithms. Being algorithms from the 1970s they are extemely memory efficient.
The problem is the huge number of virus signatures. This leads to the algorithms’ datastructures growing quite large.
You can’t send those datastructures to swap, as there are no parts of the algorithms’ datastructures accessed less often than other parts. If you do force pages of them to swap disk, then they’ll be referenced moments later and just swap straight back in. (Technically we say “the random access of the datastructure forces the entire datastructure to be in the process’s working set of memory”.)
The datastructures are needed if you are scanning from the command line or scanning from a daemon.
You can’t use just a portion of the virus signatures, as you don’t get to choose which viruses you will be sent, and thus can’t tell which signatures you will need.
I guess due to some arcane changes in 12.0-RELEASE
, FreeBSD kills memory hogs such as clamav-clamd
daemon (don’t take my word for it; it is just a poor man’s guess). I even tried to lower the memory usage without much of a success. At the end, there were not too many choices or workarounds around the corner:
A
. Pray to the gods that it go away by itself, which I deemed impractical
B
. Put aside laziness, and replace security/clamsmtp
with security/amavisd-new
in order to be able to run ClamAV on-demand which has its own pros and cons
C
. Write a quick POSIX-shell script to scan for a running clamav-clamd
process using ps aux | grep clamd
, set it up as a cron job with X-minute(s) interval, and then start the server if it cannot be found running, and be done with it for the time being.
For the sake of slothfulness, I opted to go with option C
. As a consequence, I came up with a generic simple script that is able to not only monitor and restart the clamav-clamd
service but also is able to keep any other crashing services running on FreeBSD.
[Read More...]