Discussion:
[Xymon] acknowledgements does not survive xymon restart
Norbert Kriegenburg
2018-11-01 09:57:36 UTC
Permalink
Experts:

I have a new fairly large xymon environment (4.3.28 with >10k servers), and
of course there are always a lot of alerts acknowledged.

Unfortunately all these acks are resetted if i restart the xymon daemon
(xymon.sh restart) to activate config changes f.e.
This is the same if i use the "Acknowledge alert" from the menu or the
(undocumented) "xymondack" feature of xymon.
This leads to a lot of extra effort and confusion...
Any ideas how i can save the ack situation and restore it after a restart?

Norbert
Norbert Kriegenburg
2018-11-01 12:16:06 UTC
Permalink
Don't get me wrong: i don't do frequent restarts usually, but from time to
time i need a restart, or the whole server must be restarted bc. of
patches.
And as i constantly have to add new checks with new NCV definitions the
TEST2RRD and SPLITNCV settings in xymonserver.cfg changes, this needs also
a restart.

Because we have such a huge numer of servers a lot of departments use Xymon
regularly (luckily), and use the ack mechanism to organize their work (add
ticket number to an alert, do some evaluation reports and so on).
To have >100 alerts ack'ed is normal situation.
And it creates a lot of extra work to restore this.
In old BB times the acks always survived downtimes and restarts, but now
there are no more ack files stored in the acks dir, so i thought it would
be restored from the info in the alert.chk file, what is not the case.

Norbert





From: ***@Hormel.com
To: ***@xymon.com
Cc: ***@de.ibm.com
Date: 11/01/2018 12:58 PM
Subject: Re: [Xymon] acknowledgements does not survive xymon restart




What config changes are you making that requires such frequent restarts?

Changes to hosts.cfg, alerts.cfg, analysis.cfg, client-local.cfg; these are
files that get changed most frequently in my environment. None of which
require a restart to pick up the changes. I think the only one that *does*
need a restart, would be xymonserver.cfg, and the only component that would
need to be restarted would be xymond, not the full stack.

Erik D. Schminke | Associate Systems Programmer
Hormel Foods Corporation | One Hormel Place | Austin, MN 55912
Phone: (507) 434-6817
***@hormel.com | www.hormelfoods.com
Thomas Eckert
2018-11-02 09:45:53 UTC
Permalink
Hi Norbert,

as I can not remember that I encountered this I just tested this (in a mini setup, xymon 4.3.28 on Debian9) and my 2 acks survived the xymon-restart.
As your environment is fairly large this could be size/load problem.

Random ideas:
- is the restart “clean” or does something crash during the stop-phase (logfiles)?
- you could try is to manually force writing a checkpoint-file by sending SIGUSR1 to xymond before restart.
- if you have a redundant/multi-server setup: Is there any chance that an other xymon-server is propagating incomplete state (`xymond_distribute` not enabled)?

All the best
Thomas
Don't get me wrong: i don't do frequent restarts usually, but from time to time i need a restart, or the whole server must be restarted bc. of patches.
And as i constantly have to add new checks with new NCV definitions the TEST2RRD and SPLITNCV settings in xymonserver.cfg changes, this needs also a restart.
Because we have such a huge numer of servers a lot of departments use Xymon regularly (luckily), and use the ack mechanism to organize their work (add ticket number to an alert, do some evaluation reports and so on).
To have >100 alerts ack'ed is normal situation.
And it creates a lot of extra work to restore this.
In old BB times the acks always survived downtimes and restarts, but now there are no more ack files stored in the acks dir, so i thought it would be restored from the info in the alert.chk file, what is not the case.
Norbert
<graycol.gif>EDSchminke---11/01/2018 12:58:35 PM---What config changes are you making that requires such frequent restarts? Changes to hosts.cfg, alert
Date: 11/01/2018 12:58 PM
Subject: Re: [Xymon] acknowledgements does not survive xymon restart
What config changes are you making that requires such frequent restarts?
Changes to hosts.cfg, alerts.cfg, analysis.cfg, client-local.cfg; these are
files that get changed most frequently in my environment. None of which
require a restart to pick up the changes. I think the only one that *does*
need a restart, would be xymonserver.cfg, and the only component that would
need to be restarted would be xymond, not the full stack.
Erik D. Schminke | Associate Systems Programmer
Hormel Foods Corporation | One Hormel Place | Austin, MN 55912
Phone: (507) 434-6817
_______________________________________________
Xymon mailing list
http://lists.xymon.com/mailman/listinfo/xymon
Norbert Kriegenburg
2018-11-02 11:43:45 UTC
Permalink
Hi Thomas,

thx for your suggestions, but unfortunately this did not catch the issue.
The restart runs without messages, nothing suspicious in the logs.
But after restart and after some minutes all acks are gone.
Also the ack table on bottom of the nongreen page is empty.

My alert.chk file is always up-to-date, a SIGUSR1 does not change anything.
But it is quite large (5,1MB) currently due to the lot of alerts (my access
to one of the DMZ is blocked creating a lot of conn/ssh/rdp alerts).

I wrote a script to mass-ack such alerts, otherwise the noise would be
unmanageable.
This acks all the red conns/ssh/rdp for this DMZ, and i can see the acks in
the nongreen page.
Until next restart...

Btw: In difference what Erik wrote: at least new CLASS settings in
analysis.cfg need a xymon.sh restart to take effect (just checked).

Norbert




From: Thomas Eckert <***@it-eckert.de>
To: xymon <***@xymon.com>
Cc: Norbert Kriegenburg <***@de.ibm.com>
Date: 11/02/2018 10:45 AM
Subject: Re: [Xymon] acknowledgements does not survive xymon restart



Hi Norbert,

as I can not remember that I encountered this I just tested this (in a mini
setup, xymon 4.3.28 on Debian9) and my 2 acks survived the xymon-restart.
As your environment is fairly large this could be size/load problem.

Random ideas:
- is the restart “clean” or does something crash during the stop-phase
(logfiles)?
- you could try is to manually force writing a checkpoint-file by sending
SIGUSR1 to xymond before restart.
- if you have a redundant/multi-server setup: Is there any chance that an
other xymon-server is propagating incomplete state (`xymond_distribute` not
enabled)?

All the best
Thomas
On 01 Nov 2018, at 13:16, Norbert Kriegenburg <
***@de.ibm.com> wrote:



Don't get me wrong: i don't do frequent restarts usually, but from
time to time i need a restart, or the whole server must be restarted
bc. of patches.
And as i constantly have to add new checks with new NCV definitions
the TEST2RRD and SPLITNCV settings in xymonserver.cfg changes, this
needs also a restart.

Because we have such a huge numer of servers a lot of departments use
Xymon regularly (luckily), and use the ack mechanism to organize
their work (add ticket number to an alert, do some evaluation reports
and so on).
To have >100 alerts ack'ed is normal situation.
And it creates a lot of extra work to restore this.
In old BB times the acks always survived downtimes and restarts, but
now there are no more ack files stored in the acks dir, so i thought
it would be restored from the info in the alert.chk file, what is not
the case.

Norbert



<graycol.gif>EDSchminke---11/01/2018 12:58:35 PM---What config
changes are you making that requires such frequent restarts? Changes
to hosts.cfg, alert

From: ***@Hormel.com
To: ***@xymon.com
Cc: ***@de.ibm.com
Date: 11/01/2018 12:58 PM
Subject: Re: [Xymon] acknowledgements does not survive xymon restart






What config changes are you making that requires such frequent
restarts?

Changes to hosts.cfg, alerts.cfg, analysis.cfg, client-local.cfg;
these are
files that get changed most frequently in my environment. None of
which
require a restart to pick up the changes. I think the only one that
*does*
need a restart, would be xymonserver.cfg, and the only component that
would
need to be restarted would be xymxm-multiack.sh -t conn -c clear -a
rdp -d 144000 -r "FW blocked" -i de152911 ond, not the full stack.

Erik D. Schminke | Associate Systems Programmer
Hormel Foods Corporation | One Hormel Place | Austin, MN 55912
Phone: (507) 434-6817
***@hormel.com | www.hormelfoods.com





_______________________________________________
Xymon mailing list
***@xymon.com
http://lists.xymon.com/mailman/listinfo/xymon
m***@tdiehl.org
2018-11-02 13:10:36 UTC
Permalink
Post by Norbert Kriegenburg
I have a new fairly large xymon environment (4.3.28 with >10k servers), and
of course there are always a lot of alerts acknowledged.
Unfortunately all these acks are resetted if i restart the xymon daemon
(xymon.sh restart) to activate config changes f.e.
This is the same if i use the "Acknowledge alert" from the menu or the
(undocumented) "xymondack" feature of xymon.
This leads to a lot of extra effort and confusion...
Any ideas how i can save the ack situation and restore it after a restart?
You might want to have a look at these old threads. I remember this problem
because it bit me back then.

http://lists.xymon.com/archive/2013-March/037082.html

http://lists.xymon.com/archive/2013-January/036721.html

HTH,
--
Tom ***@tdiehl.org
Norbert Kriegenburg
2018-11-02 15:27:34 UTC
Permalink
Tom,

checked the links, and it looked like a similar problem at first sight, but
does not describe my issue.
My env is quite different:

- my xymon runs in the xymon user homedir /home/xymon/server (freshly
compiled from source with configure and make)
- i have the correct xymond.chk and alert.chk in my $XYMONTMP
(/home/xymon/server/tmp), which are still present after i stopped xymon
- the settings in tasks.cfg for xymond and alert are correct (otherwise the
chk files wouldn't be updated regularly)

I have another Xymon installation (much smaller, but same design and
version), where i verified this behaviour.
So seems not related to the number of servers or the size of the chk files.

My tasks.cfg settings:

[xymond]
ENVFILE /home/xymon/server/etc/xymonserver.cfg
CMD xymond --pidfile=$XYMONSERVERLOGS/xymond.pid \
--restart=$XYMONTMP/xymond.chk \
--checkpoint-file=$XYMONTMP/xymond.chk \
--checkpoint-interval=600 \
--log=$XYMONSERVERLOGS/xymond.log \
--admin-senders=127.0.0.1,$XYMONSERVERIP \
--ack-each-color \
--ghosts=match

[alert]
ENVFILE /home/xymon/server/etc/xymonserver.cfg
NEEDS xymond
CMD xymond_channel \
--channel=page \
--log=$XYMONSERVERLOGS/alert.log xymond_alert \
--checkpoint-file=$XYMONTMP/alert.chk \
--checkpoint-interval=600

Norbert




From: ***@tdiehl.org
To: Norbert Kriegenburg <***@de.ibm.com>
Cc: ***@xymon.com
Date: 11/02/2018 02:11 PM
Subject: Re: [Xymon] acknowledgements does not survive xymon restart
Post by Norbert Kriegenburg
I have a new fairly large xymon environment (4.3.28 with >10k servers),
andErik?
Post by Norbert Kriegenburg
of course there are always a lot of alerts acknowledged.
Unfortunately all these acks are resetted if i restart the xymon daemon
(xymon.sh restart) to activate config changes f.e.
This is the same if i use the "Acknowledge alert" from the menu or the
(undocumented) "xymondack" feature of xymon.
This leads to a lot of extra effort and confusion...
Any ideas how i can save the ack situation and restore it after a restart?
You might want to have a look at these old threads. I remember this problem
because it bit me back then.

http://lists.xymon.com/archive/2013-March/037082.html


http://lists.xymon.com/archive/2013-January/036721.html


HTH,

--
Tom ***@tdiehl.org
m***@tdiehl.org
2018-11-02 15:55:57 UTC
Permalink
Hi Norbert,
Post by Norbert Kriegenburg
Tom,
checked the links, and it looked like a similar problem at first sight, but
does not describe my issue.
- my xymon runs in the xymon user homedir /home/xymon/server (freshly
compiled from source with configure and make)
- i have the correct xymond.chk and alert.chk in my $XYMONTMP
(/home/xymon/server/tmp), which are still present after i stopped xymon
- the settings in tasks.cfg for xymond and alert are correct (otherwise the
chk files wouldn't be updated regularly)
If the chk files are still there after a reboot, then my idea was wrong.

In my case the chk files were getting deleted during a reboot.

Sorry, but that is the only idea I had.


Regards,

Tom

Loading...