[Supervisor-users] Problems restarting supervisor

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Supervisor-users] Problems restarting supervisor

Florian.Brucker
Hi everyone,

I'm having troubles restarting the supervisor service itself (not one of the processes it manages) once a particular program is running under supervisor.

This is on Debian 8.10, running supervisor 3.0r1-1+deb8u1 installed via apt-get.

Let's start with what I'm trying to do. supervisor is running as normal, managing a single program (a background worker for asynchronous jobs from CKAN, see http://docs.ckan.org/en/2.8/maintaining/background-tasks.html#using-supervisor):

        # supervisorctl status
        ckan_worker                      RUNNING    pid 14473, uptime 0:00:32

Here is the configuration of the program (I've removed the configuration options recommended by CKAN during debugging, they did not change the problem):

        # cat /etc/supervisor/conf.d/ckan.conf

        [program:ckan_worker]
        command=/usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker -c /etc/ckan/default/production.ini

Now I want to restart supervisor (because I've changed its configuration), so I do:

        # service supervisor restart
        Job for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details.


        # systemctl status supervisor.service
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: failed (Result: exit-code) since Wed 2018-06-27 17:54:09 CEST; 16s ago
          Process: 14507 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14511 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)

        Jun 27 17:54:09 test-transparenz supervisor[14511]: Starting supervisor:
        Jun 27 17:54:09 test-transparenz systemd[1]: supervisor.service: control process exited, code=exited status=1
        Jun 27 17:54:09 test-transparenz systemd[1]: Failed to start LSB: Start/stop supervisor.
        Jun 27 17:54:09 test-transparenz systemd[1]: Unit supervisor.service entered failed state.


        # journalctl -xn
        -- Logs begin at Wed 2018-06-27 13:59:24 CEST, end at Wed 2018-06-27 17:54:09 CEST. --
        Jun 27 17:52:21 test-transparenz systemd[1]: Starting LSB: Start/stop supervisor...
        -- Subject: Unit supervisor.service has begun with start-up
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has begun starting up.
        Jun 27 17:52:22 test-transparenz supervisor[14464]: Starting supervisor: supervisord.
        Jun 27 17:52:22 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.
        -- Subject: Unit supervisor.service has finished start-up
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has finished starting up.
        --
        -- The start-up result is done.
        Jun 27 17:54:09 test-transparenz systemd[1]: Stopping LSB: Start/stop supervisor...
        -- Subject: Unit supervisor.service has begun shutting down
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has begun shutting down.
        Jun 27 17:54:09 test-transparenz supervisor[14507]: Stopping supervisor: supervisord.
        Jun 27 17:54:09 test-transparenz systemd[1]: Starting LSB: Start/stop supervisor...
        -- Subject: Unit supervisor.service has begun with start-up
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has begun starting up.
        Jun 27 17:54:09 test-transparenz supervisor[14511]: Starting supervisor:
        Jun 27 17:54:09 test-transparenz systemd[1]: supervisor.service: control process exited, code=exited status=1
        Jun 27 17:54:09 test-transparenz systemd[1]: Failed to start LSB: Start/stop supervisor.
        -- Subject: Unit supervisor.service has failed
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has failed.
        --
        -- The result is failed.
        Jun 27 17:54:09 test-transparenz systemd[1]: Unit supervisor.service entered failed state.

The supervisor logs show:

        2018-06-27 17:52:21,094 CRIT Supervisor running as root (no user in config file)
        2018-06-27 17:52:21,094 WARN Included extra file "/etc/supervisor/conf.d/ckan.conf" during parsing
        2018-06-27 17:52:21,100 INFO RPC interface 'supervisor' initialized
        2018-06-27 17:52:21,100 CRIT Server 'unix_http_server' running without any HTTP authentication checking
        2018-06-27 17:52:21,101 INFO daemonizing the supervisord process
        2018-06-27 17:52:21,101 INFO supervisord started with pid 14469
        2018-06-27 17:52:22,129 INFO spawned: 'ckan_worker' with pid 14473
        2018-06-27 17:52:23,131 INFO success: ckan_worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
        2018-06-27 17:52:23,801 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,801 INFO  [rq.worker] RQ worker u'rq:worker:test-transparenz.14473' started, version 0.6.0

        2018-06-27 17:52:23,802 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,801 INFO  [rq.worker] Cleaning registries for queue: ckan:default:default

        2018-06-27 17:52:23,802 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,802 INFO  [rq.worker]

        2018-06-27 17:52:23,802 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,802 INFO  [rq.worker] *** Listening on ckan:default:default...

        2018-06-27 17:54:09,638 WARN received SIGTERM indicating exit request
        2018-06-27 17:54:09,638 DEBG killing ckan_worker (pid 14473) with signal SIGTERM
        2018-06-27 17:54:09,638 INFO waiting for ckan_worker to die
        2018-06-27 17:54:09,639 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:54:09,638 WARNI [rq.worker] Warm shut down requested

        2018-06-27 17:54:09,722 DEBG fd 9 closed, stopped monitoring <POutputDispatcher at 140293577272064 for <Subprocess at 140293577518976 with name ckan_worker in state STOPPING> (stderr)>
        2018-06-27 17:54:09,723 DEBG fd 7 closed, stopped monitoring <POutputDispatcher at 140293577269616 for <Subprocess at 140293577518976 with name ckan_worker in state STOPPING> (stdout)>
        2018-06-27 17:54:09,724 INFO stopped: ckan_worker (exit status 0)
        2018-06-27 17:54:09,724 DEBG received SIGCLD indicating a child quit

Interestingly, starting the supervisor service at this point works just fine:

        # service supervisor start


        # service supervisor status
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: active (running) since Wed 2018-06-27 17:59:06 CEST; 3s ago
          Process: 14507 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14558 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)
           CGroup: /system.slice/supervisor.service
                           ├─14563 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
                           └─14569 /usr/lib/ckan/default/bin/python2 /usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker -c /etc/ckan/default/production.ini

        Jun 27 17:59:06 test-transparenz supervisor[14558]: Starting supervisor: supervisord.
        Jun 27 17:59:06 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.


        # supervisorctl status
        ckan_worker                      RUNNING    pid 14569, uptime 0:00:08

However, then trying to restart the service leads to the error above.

Just stopping the supervisor service (instead of restarting it) also works:

        # service supervisor stop

        # service supervisor status
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: inactive (dead) since Wed 2018-06-27 18:00:51 CEST; 4s ago
          Process: 14617 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14558 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)

        Jun 27 17:59:06 test-transparenz supervisor[14558]: Starting supervisor: supervisord.
        Jun 27 17:59:06 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.
        Jun 27 18:00:51 test-transparenz systemd[1]: Stopping LSB: Start/stop supervisor...
        Jun 27 18:00:51 test-transparenz supervisor[14617]: Stopping supervisor: supervisord.
        Jun 27 18:00:51 test-transparenz systemd[1]: Stopped LSB: Start/stop supervisor.

Afterwards, the service can be started normally:

        # service supervisor start

        # service supervisor status
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: active (running) since Wed 2018-06-27 18:02:10 CEST; 6s ago
          Process: 14617 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14648 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)
           CGroup: /system.slice/supervisor.service
                           ├─14653 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
                           └─14660 /usr/lib/ckan/default/bin/python2 /usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker -c /etc/ckan/default/production.ini

        Jun 27 18:02:10 test-transparenz supervisor[14648]: Starting supervisor: supervisord.
        Jun 27 18:02:10 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.

However, stopping and starting without pausing inbetween (# service supervisor stop && service supervisor start) also causes the error.

To me this looks like a timing issue, where the managed program takes too long to shutdown. I've tried adding "stopwaitsecs=10" to the program's configuration, but that doesn't seem to change anything.

Right now I am not sure whether this is a problem with a) supervisor, b) the systemd service file, or c) the program I'm managing.

Any ideas or pointers?


Best regards,
Florian

_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users
rod
Reply | Threaded
Open this post in threaded view
|

Re: [Supervisor-users] Problems restarting supervisor

rod
Can't help with the error as such, but incase you weren't aware, and incase the configuration you're changing is program related... you don't need to restart supervisor for program config changes, you can do a reread and update instead.

On Wed, Jun 27, 2018 at 5:19 PM <[hidden email]> wrote:
Hi everyone,

I'm having troubles restarting the supervisor service itself (not one of the processes it manages) once a particular program is running under supervisor.

This is on Debian 8.10, running supervisor 3.0r1-1+deb8u1 installed via apt-get.

Let's start with what I'm trying to do. supervisor is running as normal, managing a single program (a background worker for asynchronous jobs from CKAN, see http://docs.ckan.org/en/2.8/maintaining/background-tasks.html#using-supervisor):

        # supervisorctl status
        ckan_worker                      RUNNING    pid 14473, uptime 0:00:32

Here is the configuration of the program (I've removed the configuration options recommended by CKAN during debugging, they did not change the problem):

        # cat /etc/supervisor/conf.d/ckan.conf

        [program:ckan_worker]
        command=/usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker -c /etc/ckan/default/production.ini

Now I want to restart supervisor (because I've changed its configuration), so I do:

        # service supervisor restart
        Job for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details.


        # systemctl status supervisor.service
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: failed (Result: exit-code) since Wed 2018-06-27 17:54:09 CEST; 16s ago
          Process: 14507 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14511 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)

        Jun 27 17:54:09 test-transparenz supervisor[14511]: Starting supervisor:
        Jun 27 17:54:09 test-transparenz systemd[1]: supervisor.service: control process exited, code=exited status=1
        Jun 27 17:54:09 test-transparenz systemd[1]: Failed to start LSB: Start/stop supervisor.
        Jun 27 17:54:09 test-transparenz systemd[1]: Unit supervisor.service entered failed state.


        # journalctl -xn
        -- Logs begin at Wed 2018-06-27 13:59:24 CEST, end at Wed 2018-06-27 17:54:09 CEST. --
        Jun 27 17:52:21 test-transparenz systemd[1]: Starting LSB: Start/stop supervisor...
        -- Subject: Unit supervisor.service has begun with start-up
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has begun starting up.
        Jun 27 17:52:22 test-transparenz supervisor[14464]: Starting supervisor: supervisord.
        Jun 27 17:52:22 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.
        -- Subject: Unit supervisor.service has finished start-up
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has finished starting up.
        --
        -- The start-up result is done.
        Jun 27 17:54:09 test-transparenz systemd[1]: Stopping LSB: Start/stop supervisor...
        -- Subject: Unit supervisor.service has begun shutting down
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has begun shutting down.
        Jun 27 17:54:09 test-transparenz supervisor[14507]: Stopping supervisor: supervisord.
        Jun 27 17:54:09 test-transparenz systemd[1]: Starting LSB: Start/stop supervisor...
        -- Subject: Unit supervisor.service has begun with start-up
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has begun starting up.
        Jun 27 17:54:09 test-transparenz supervisor[14511]: Starting supervisor:
        Jun 27 17:54:09 test-transparenz systemd[1]: supervisor.service: control process exited, code=exited status=1
        Jun 27 17:54:09 test-transparenz systemd[1]: Failed to start LSB: Start/stop supervisor.
        -- Subject: Unit supervisor.service has failed
        -- Defined-By: systemd
        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        --
        -- Unit supervisor.service has failed.
        --
        -- The result is failed.
        Jun 27 17:54:09 test-transparenz systemd[1]: Unit supervisor.service entered failed state.

The supervisor logs show:

        2018-06-27 17:52:21,094 CRIT Supervisor running as root (no user in config file)
        2018-06-27 17:52:21,094 WARN Included extra file "/etc/supervisor/conf.d/ckan.conf" during parsing
        2018-06-27 17:52:21,100 INFO RPC interface 'supervisor' initialized
        2018-06-27 17:52:21,100 CRIT Server 'unix_http_server' running without any HTTP authentication checking
        2018-06-27 17:52:21,101 INFO daemonizing the supervisord process
        2018-06-27 17:52:21,101 INFO supervisord started with pid 14469
        2018-06-27 17:52:22,129 INFO spawned: 'ckan_worker' with pid 14473
        2018-06-27 17:52:23,131 INFO success: ckan_worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
        2018-06-27 17:52:23,801 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,801 INFO  [rq.worker] RQ worker u'rq:worker:test-transparenz.14473' started, version 0.6.0

        2018-06-27 17:52:23,802 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,801 INFO  [rq.worker] Cleaning registries for queue: ckan:default:default

        2018-06-27 17:52:23,802 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,802 INFO  [rq.worker]

        2018-06-27 17:52:23,802 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:52:23,802 INFO  [rq.worker] *** Listening on ckan:default:default...

        2018-06-27 17:54:09,638 WARN received SIGTERM indicating exit request
        2018-06-27 17:54:09,638 DEBG killing ckan_worker (pid 14473) with signal SIGTERM
        2018-06-27 17:54:09,638 INFO waiting for ckan_worker to die
        2018-06-27 17:54:09,639 DEBG 'ckan_worker' stderr output:
        2018-06-27 17:54:09,638 WARNI [rq.worker] Warm shut down requested

        2018-06-27 17:54:09,722 DEBG fd 9 closed, stopped monitoring <POutputDispatcher at 140293577272064 for <Subprocess at 140293577518976 with name ckan_worker in state STOPPING> (stderr)>
        2018-06-27 17:54:09,723 DEBG fd 7 closed, stopped monitoring <POutputDispatcher at 140293577269616 for <Subprocess at 140293577518976 with name ckan_worker in state STOPPING> (stdout)>
        2018-06-27 17:54:09,724 INFO stopped: ckan_worker (exit status 0)
        2018-06-27 17:54:09,724 DEBG received SIGCLD indicating a child quit

Interestingly, starting the supervisor service at this point works just fine:

        # service supervisor start


        # service supervisor status
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: active (running) since Wed 2018-06-27 17:59:06 CEST; 3s ago
          Process: 14507 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14558 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)
           CGroup: /system.slice/supervisor.service
                           ├─14563 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
                           └─14569 /usr/lib/ckan/default/bin/python2 /usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker -c /etc/ckan/default/production.ini

        Jun 27 17:59:06 test-transparenz supervisor[14558]: Starting supervisor: supervisord.
        Jun 27 17:59:06 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.


        # supervisorctl status
        ckan_worker                      RUNNING    pid 14569, uptime 0:00:08

However, then trying to restart the service leads to the error above.

Just stopping the supervisor service (instead of restarting it) also works:

        # service supervisor stop

        # service supervisor status
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: inactive (dead) since Wed 2018-06-27 18:00:51 CEST; 4s ago
          Process: 14617 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14558 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)

        Jun 27 17:59:06 test-transparenz supervisor[14558]: Starting supervisor: supervisord.
        Jun 27 17:59:06 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.
        Jun 27 18:00:51 test-transparenz systemd[1]: Stopping LSB: Start/stop supervisor...
        Jun 27 18:00:51 test-transparenz supervisor[14617]: Stopping supervisor: supervisord.
        Jun 27 18:00:51 test-transparenz systemd[1]: Stopped LSB: Start/stop supervisor.

Afterwards, the service can be started normally:

        # service supervisor start

        # service supervisor status
        ● supervisor.service - LSB: Start/stop supervisor
           Loaded: loaded (/etc/init.d/supervisor)
           Active: active (running) since Wed 2018-06-27 18:02:10 CEST; 6s ago
          Process: 14617 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)
          Process: 14648 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)
           CGroup: /system.slice/supervisor.service
                           ├─14653 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
                           └─14660 /usr/lib/ckan/default/bin/python2 /usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker -c /etc/ckan/default/production.ini

        Jun 27 18:02:10 test-transparenz supervisor[14648]: Starting supervisor: supervisord.
        Jun 27 18:02:10 test-transparenz systemd[1]: Started LSB: Start/stop supervisor.

However, stopping and starting without pausing inbetween (# service supervisor stop && service supervisor start) also causes the error.

To me this looks like a timing issue, where the managed program takes too long to shutdown. I've tried adding "stopwaitsecs=10" to the program's configuration, but that doesn't seem to change anything.

Right now I am not sure whether this is a problem with a) supervisor, b) the systemd service file, or c) the program I'm managing.

Any ideas or pointers?


Best regards,
Florian
_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users

_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users