[Supervisor-users] lost supervisor.sock

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Supervisor-users] lost supervisor.sock

Paul Fox
i'm running a relatively old version of supervisor, so if this has
been fixed in a later release, i'll live with it -- upgrading this
particular system is difficult at the moment.

i have a supervisord process that's been running for over 32 days,
since april 6th.  it's still running, and will happily restart
processes if i kill them manually.  the web interface (on port 9001)
is also working fine.

but if i try and talk to it with supervisorctl, i get:

    # supervisorctl status
    unix:///var/tmp/supervisor.sock no such file

the socket file is indeed not present in /var/tmp.

this happened sometime between 3:01am yesterday, and 3:01am today.  i
know this because there's one process i restart every night at that
time via cron, and this morning i got a failure message, whereas
yesterday i did not.

is this a familar, known, bug?

since everything's kind of working right now, i'm happy to leave the
system in this state in case someone has a debugging idea -- i'm not
particularly skilled with python, but can follow directions well.  :-)

on the other hand, if the answer is "upgrade", then i'll just restart
it and see if it happens again.

paul

p.s. here's the fedora version info:
    # yum info supervisor
    Installed Packages
    Name        : supervisor
    Arch        : noarch
    Version     : 3.0
    Release     : 1.fc18
    Size        : 2.5 M
    Repo        : installed
    From repo   : updates
    Summary     : A System for Allowing the Control of Process State on UNIX
    URL         : http://supervisord.org/
    License     : ZPLv2.1 and BSD and MIT
    Description : The supervisor is a client/server system that allows its users to
                : control a number of processes on UNIX-like operating systems.



----------------------
 paul fox, [hidden email] (arlington, ma, where it's 56.7 degrees)
_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users
Reply | Threaded
Open this post in threaded view
|

Re: [Supervisor-users] lost supervisor.sock

Matt Black

I’d suggest checking your /etc/supervisord.conf to be sure that the following two config entries match up:

[unix_http_server]
file=/tmp/supervisord.sock      ; (the path to the socket file) 

[supervisorctl]
serverurl=unix:///tmp/supervisord.sock

Matt


On 9 May 2014 07:52, Paul Fox <[hidden email]> wrote:

i'm running a relatively old version of supervisor, so if this has
been fixed in a later release, i'll live with it -- upgrading this
particular system is difficult at the moment.

i have a supervisord process that's been running for over 32 days,
since april 6th.  it's still running, and will happily restart
processes if i kill them manually.  the web interface (on port 9001)
is also working fine.

but if i try and talk to it with supervisorctl, i get:

    # supervisorctl status
    unix:///var/tmp/supervisor.sock no such file

the socket file is indeed not present in /var/tmp.

this happened sometime between 3:01am yesterday, and 3:01am today.  i
know this because there's one process i restart every night at that
time via cron, and this morning i got a failure message, whereas
yesterday i did not.

is this a familar, known, bug?

since everything's kind of working right now, i'm happy to leave the
system in this state in case someone has a debugging idea -- i'm not
particularly skilled with python, but can follow directions well.  :-)

on the other hand, if the answer is "upgrade", then i'll just restart
it and see if it happens again.

paul

p.s. here's the fedora version info:
    # yum info supervisor
    Installed Packages
    Name        : supervisor
    Arch        : noarch
    Version     : 3.0
    Release     : 1.fc18
    Size        : 2.5 M
    Repo        : installed
    From repo   : updates
    Summary     : A System for Allowing the Control of Process State on UNIX
    URL         : http://supervisord.org/
    License     : ZPLv2.1 and BSD and MIT
    Description : The supervisor is a client/server system that allows its users to
                : control a number of processes on UNIX-like operating systems.



----------------------
 paul fox, [hidden email] (arlington, ma, where it's 56.7 degrees)
_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users


_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users
Reply | Threaded
Open this post in threaded view
|

Re: [Supervisor-users] lost supervisor.sock

Paul Fox
matt wrote:
 > I’d suggest checking your /etc/supervisord.conf to be sure that the
 > following two config entries match up:
 >
 > [unix_http_server]
 > file=/tmp/supervisord.sock      ; (the path to the socket file)
 >
 > [supervisorctl]
 > serverurl=unix:///tmp/supervisord.sock

they match.

 [unix_http_server]
 file=/var/tmp/supervisor.sock   ; (the path to the socket file)

 [supervisorctl]
 serverurl=unix:///var/tmp/supervisor.sock ; use a unix:// URL  for a unix socket


paul

 >
 > Matt
 >
 >
 > On 9 May 2014 07:52, Paul Fox <[hidden email]> wrote:
 >
 > i'm running a relatively old version of supervisor, so if this has
 > > been fixed in a later release, i'll live with it -- upgrading this
 > > particular system is difficult at the moment.
 > >
 > > i have a supervisord process that's been running for over 32 days,
 > > since april 6th.  it's still running, and will happily restart
 > > processes if i kill them manually.  the web interface (on port 9001)
 > > is also working fine.
 > >
 > > but if i try and talk to it with supervisorctl, i get:
 > >
 > >     # supervisorctl status
 > >     unix:///var/tmp/supervisor.sock no such file
 > >
 > > the socket file is indeed not present in /var/tmp.
 > >
 > > this happened sometime between 3:01am yesterday, and 3:01am today.  i
 > > know this because there's one process i restart every night at that
 > > time via cron, and this morning i got a failure message, whereas
 > > yesterday i did not.
 > >
 > > is this a familar, known, bug?
 > >
 > > since everything's kind of working right now, i'm happy to leave the
 > > system in this state in case someone has a debugging idea -- i'm not
 > > particularly skilled with python, but can follow directions well.  :-)
 > >
 > > on the other hand, if the answer is "upgrade", then i'll just restart
 > > it and see if it happens again.
 > >
 > > paul
 > >
 > > p.s. here's the fedora version info:
 > >     # yum info supervisor
 > >     Installed Packages
 > >     Name        : supervisor
 > >     Arch        : noarch
 > >     Version     : 3.0
 > >     Release     : 1.fc18
 > >     Size        : 2.5 M
 > >     Repo        : installed
 > >     From repo   : updates
 > >     Summary     : A System for Allowing the Control of Process State on
 > > UNIX
 > >     URL         : http://supervisord.org/
 > >     License     : ZPLv2.1 and BSD and MIT
 > >     Description : The supervisor is a client/server system that allows its
 > > users to
 > >                 : control a number of processes on UNIX-like operating
 > > systems.
 > >
 > >
 > >
 > > ----------------------
 > >  paul fox, [hidden email] (arlington, ma, where it's 56.7
 > > degrees)
 > > _______________________________________________
 > > Supervisor-users mailing list
 > > [hidden email]
 > > https://lists.supervisord.org/mailman/listinfo/supervisor-users
 > >

----------------------
 paul fox, [hidden email] (arlington, ma, where it's 55.8 degrees)
_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users
Reply | Threaded
Open this post in threaded view
|

Re: [Supervisor-users] lost supervisor.sock

Paul Fox
In reply to this post by Matt Black
matt's mail made me start thinking about pathnames, so i did
some more groveling.

it seems that netstat thinks the socket is being listened on:

    # netstat -an | grep supervisor
    unix  2      [ ACC ]     STREAM     LISTENING     8404     /var/tmp/supervisor.sock.567

and it seems that supervisord still thinks it has the socket open:
    # ps axf | grep '[p]ython.*supervisord'
      723 ?        Ss    83:07 /usr/bin/python /usr/bin/supervisord
    # ls -l /proc/723/fd | grep sock
    lrwx------ 1 root root 64 May  8 19:26 4 -> socket:[8247]
    lrwx------ 1 root root 64 May  8 19:26 5 -> socket:[8404]

so i guess it's possible that some unrelated process removed the
socket from /tmp.

i guess i'll restart supervisord, and see if it happens again.

paul
----------------------
 paul fox, [hidden email] (arlington, ma, where it's 54.9 degrees)
_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users
Reply | Threaded
Open this post in threaded view
|

Re: [Supervisor-users] lost supervisor.sock

Sergey Maslyakov
See if any of your periodic clean-up jobs could wipe out the entry from /tmp. If a directory entry was created and opened by a process, then the removal of the entry from the directory does not affect the process that holds the socket open. But any new attempt to find or open the deleted entry results in error.

This is just one possibility of what could be happening in your case.


/Sergey


On Thu, May 8, 2014 at 8:25 PM, Paul Fox <[hidden email]> wrote:
matt's mail made me start thinking about pathnames, so i did
some more groveling.

it seems that netstat thinks the socket is being listened on:

    # netstat -an | grep supervisor
    unix  2      [ ACC ]     STREAM     LISTENING     8404     /var/tmp/supervisor.sock.567

and it seems that supervisord still thinks it has the socket open:
    # ps axf | grep '[p]ython.*supervisord'
      723 ?        Ss    83:07 /usr/bin/python /usr/bin/supervisord
    # ls -l /proc/723/fd | grep sock
    lrwx------ 1 root root 64 May  8 19:26 4 -> socket:[8247]
    lrwx------ 1 root root 64 May  8 19:26 5 -> socket:[8404]

so i guess it's possible that some unrelated process removed the
socket from /tmp.

i guess i'll restart supervisord, and see if it happens again.

paul
----------------------
 paul fox, [hidden email] (arlington, ma, where it's 54.9 degrees)
_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users


_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users
Reply | Threaded
Open this post in threaded view
|

Re: [Supervisor-users] lost supervisor.sock

Paul Fox
sergey wrote:
 > See if any of your periodic clean-up jobs could wipe out the entry from
 > /tmp. If a directory entry was created and opened by a process, then the
 > removal of the entry from the directory does not affect the process that
 > holds the socket open. But any new attempt to find or open the deleted
 > entry results in error.
 >
 > This is just one possibility of what could be happening in your case.

yeah, i wondered about that too.  but supervisord was running for over
a month successfully -- cleanup jobs would typicall run more often.  and
i'm the only person who logs in (it's a server), and i hadn't done so
during the time period when the socket got removed.

i'll look at the cron jobs, but i'm not hopeful.  there aren't many
other possibilities though!

thanks,
paul

 >
 >
 > /Sergey
 >
 >
 > On Thu, May 8, 2014 at 8:25 PM, Paul Fox <[hidden email]> wrote:
 >
 > > matt's mail made me start thinking about pathnames, so i did
 > > some more groveling.
 > >
 > > it seems that netstat thinks the socket is being listened on:
 > >
 > >     # netstat -an | grep supervisor
 > >     unix  2      [ ACC ]     STREAM     LISTENING     8404
 > > /var/tmp/supervisor.sock.567
 > >
 > > and it seems that supervisord still thinks it has the socket open:
 > >     # ps axf | grep '[p]ython.*supervisord'
 > >       723 ?        Ss    83:07 /usr/bin/python /usr/bin/supervisord
 > >     # ls -l /proc/723/fd | grep sock
 > >     lrwx------ 1 root root 64 May  8 19:26 4 -> socket:[8247]
 > >     lrwx------ 1 root root 64 May  8 19:26 5 -> socket:[8404]
 > >
 > > so i guess it's possible that some unrelated process removed the
 > > socket from /tmp.
 > >
 > > i guess i'll restart supervisord, and see if it happens again.
 > >
 > > paul
 > > ----------------------
 > >  paul fox, [hidden email] (arlington, ma, where it's 54.9
 > > degrees)
 > > _______________________________________________
 > > Supervisor-users mailing list
 > > [hidden email]
 > > https://lists.supervisord.org/mailman/listinfo/supervisor-users
 > >

----------------------
 paul fox, [hidden email] (arlington, ma, where it's 48.4 degrees)
_______________________________________________
Supervisor-users mailing list
[hidden email]
https://lists.supervisord.org/mailman/listinfo/supervisor-users