The symptom
Nginx service in Ubuntu cannot successfully restart, or start after being stopped, if there is any server configured to listen on unix socket file(s) (e.g. /var/run/nginx.sock). This happens in Ubuntu Xenial (16.04) and Bionic (18.04), and most likely also related Debian versions.
The cause
Nginx bug #753 indicates that nginx will not clean up (remove) the socket file if it exits in response to SIGQUIT signal. It will, however, clean up the file if it exits in response to SIGTERM signal. Unfortunately, Ubuntu and Debian nginx service scripts use SIGQUIT to process service stop and restart requests, which consequently leave the socket file in place after nginx exits. This causes 'address in use' error when nginx is (re)started later, with messages like bind() to unix:/var/run/nginx.sock failed (98: Address already in use)
appearing in nginx error log.
The solution
In short, modify nginx service scripts to force the use of SIGTERM instead of SIGQUIT to process stop and restart requests.
For systems using systemd, including Ubuntu Xenial and Bionic, check current service script by running systemctl cat nginx
, which usually gives the following output:
# /lib/systemd/system/nginx.service
# Stop dance for nginx
# =======================
#
# ExecStop sends SIGSTOP (graceful stop) to the nginx process.
# If, after 5s (--retry QUIT/5) nginx is still running, systemd takes control
# and sends SIGTERM (fast shutdown) to the main process.
# After another 5s (TimeoutStopSec=5), and if nginx is alive, systemd sends
# SIGKILL to all the remaining processes in the process group (KillMode=mixed).
#
# nginx signals reference doc:
# http://nginx.org/en/docs/control.html
#
[Unit]
Description=A high performance web server and a reverse proxy server
Documentation=man:nginx(8)
After=network.target
[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t -q -g 'daemon on; master_process on;'
ExecStart=/usr/sbin/nginx -g 'daemon on; master_process on;'
ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reload
ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
TimeoutStopSec=5
KillMode=mixed
[Install]
WantedBy=multi-user.target
Notice the ExecStop
setting in [Service]
section, which has retry
parameter set toQUIT/5
, and that needs to be changed to TERM/5
. In theory, this can be done by directly editing /lib/systemd/system/nginx.service
as specified at the first line shown above, followed by running systemctl daemon-reload
to activate the change. However, this runs the risk of getting overwritten in future updates to nginx. The more appropriate way of doing this is using systemctl edit nginx
to generate an override file in /etc/systemd/system/nginx.service.d
containing the following lines:
[Service]
ExecStop=
ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry TERM/5 --pidfile /run/nginx.pid
Notice that ExecStop
must be cleared first, otherwise the next line (the real override) will be ignored. After the override file is saved, systemctl edit
automatically and silently calls systemctl daemon-reload
, so manual reload is not necessary.
For older systems using SysVinit, for example Ubuntu Precise (14.04) running updated nginx versions from ppa:nginx/stable (Precise repository version does not need this fix, but is very old), a simpler fix is possible. Edit /etc/default/nginx
and change the following line
STOP_SCHEDULE="QUIT/5/TERM/5/KILL/5"
to
STOP_SCHEDULE="TERM/5/KILL/5"
Remember, in either of the above cases, manual removal of socket files are required before nginx service can be started again, but after that, restart and stop/start will work normally without manual intervention.
References
[1] https://trac.nginx.org/nginx/ticket/753
[2] https://unix.stackexchange.com/questions/164866/nginx-leaves-old-socket
Comments
comments powered by Disqus