Nginx web server listening on unix socket file fails to restart

The symptom

Nginx service in Ubuntu cannot successfully restart, or start after being stopped, if there is any server configured to listen on unix socket file(s) (e.g. /var/run/nginx.sock). This happens in Ubuntu Xenial (16.04) and Bionic (18.04), and most likely also related Debian versions.

The cause

Nginx bug #753 indicates that nginx will not clean up (remove) the socket file if it exits in response to SIGQUIT signal. It will, however, clean up the file if it exits in response to SIGTERM signal. Unfortunately, Ubuntu and Debian nginx service scripts use SIGQUIT to process service stop and restart requests, which consequently leave the socket file in place after nginx exits. This causes 'address in use' error when nginx is (re)started later, with messages like bind() to unix:/var/run/nginx.sock failed (98: Address already in use) appearing in nginx error log.

The solution

In short, modify nginx service scripts to force the use of SIGTERM instead of SIGQUIT to process stop and restart requests.

For systems using systemd, including Ubuntu Xenial and Bionic, check current service script by running systemctl cat nginx, which usually gives the following output:

# /lib/systemd/system/nginx.service
# Stop dance for nginx
# =======================
#
# ExecStop sends SIGSTOP (graceful stop) to the nginx process.
# If, after 5s (--retry QUIT/5) nginx is still running, systemd takes control
# and sends SIGTERM (fast shutdown) to the main process.
# After another 5s (TimeoutStopSec=5), and if nginx is alive, systemd sends
# SIGKILL to all the remaining processes in the process group (KillMode=mixed).
#
# nginx signals reference doc:
# http://nginx.org/en/docs/control.html
#
[Unit]
Description=A high performance web server and a reverse proxy server
Documentation=man:nginx(8)
After=network.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t -q -g 'daemon on; master_process on;'
ExecStart=/usr/sbin/nginx -g 'daemon on; master_process on;'
ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reload
ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
TimeoutStopSec=5
KillMode=mixed

[Install]
WantedBy=multi-user.target

Notice the ExecStop setting in [Service] section, which has retry parameter set toQUIT/5, and that needs to be changed to TERM/5. In theory, this can be done by directly editing /lib/systemd/system/nginx.service as specified at the first line shown above, followed by running systemctl daemon-reload to activate the change. However, this runs the risk of getting overwritten in future updates to nginx. The more appropriate way of doing this is using systemctl edit nginx to generate an override file in /etc/systemd/system/nginx.service.d containing the following lines:

[Service]
ExecStop=
ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry TERM/5 --pidfile /run/nginx.pid

Notice that ExecStop must be cleared first, otherwise the next line (the real override) will be ignored. After the override file is saved, systemctl edit automatically and silently calls systemctl daemon-reload, so manual reload is not necessary.

For older systems using SysVinit, for example Ubuntu Precise (14.04) running updated nginx versions from ppa:nginx/stable (Precise repository version does not need this fix, but is very old), a simpler fix is possible. Edit /etc/default/nginx and change the following line

STOP_SCHEDULE="QUIT/5/TERM/5/KILL/5"

STOP_SCHEDULE="TERM/5/KILL/5"

Remember, in either of the above cases, manual removal of socket files are required before nginx service can be started again, but after that, restart and stop/start will work normally without manual intervention.

References

[1] https://trac.nginx.org/nginx/ticket/753

[2] https://unix.stackexchange.com/questions/164866/nginx-leaves-old-socket

The symptom

The cause

The solution

Related Posts:

Comments