Opened 16 years ago

Closed 16 years ago

#18 closed defect (fixed)

+/- 50% of the time the count drops to 0

Reported by: richardvm Owned by: Richard (vm)
Priority: major Milestone: Pings
Component: component1 Version:
Keywords: Cc:

Description (last modified by richardvm)


Attachments (1)

graph_image.php.png (14.6 KB ) - added by richardvm 16 years ago.
Example of a failing test

Download all attachments as: .zip

Change History (8)

comment:1 by richardvm, 16 years ago

This one is not shown in the graphs:

/usr/local/share/cacti/log/cacti.lo
03/27/2009 08:01:35 AM - CMDPHP: Poller[0] Host[7631] DS[500] CMD: /usr/bin/perl /usr/local/share/cacti/scripts/ping.pl Joost.wleiden.net 2 public 161 5000 get 2 3, output: 5
<output omitted>
03/27/2009 08:01:36 AM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/joost_pings_500.rrd --template Pings 1238137291:5

/var/log/cacti.log
03/27/2009 08:01:36 AM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/joost_pings_500.rrd --template Pings 1238137291:5
OK u:0.01 s:0.01 r:95.90

by richardvm, 16 years ago

Attachment: graph_image.php.png added

Example of a failing test

comment:2 by richardvm, 16 years ago

This incident applies not only to the ping test, but also the the cacti native rrd interface statistics.

interface statistics is a "counter" which means that the actual value is the difference between the current value and the previous value. If the value is not inserted into the rrd file then my logic would be that the first value after the 'no check period' would be very high, which it is not.

The 'no check period' is always the same, so I will have to lookup the get procedure of rrd (probably can be done with rrdtool)

comment:3 by richardvm, 16 years ago

Description: modified (diff)

comment:4 by richardvm, 16 years ago

Description: modified (diff)

One of the graphs (rrdtool fetch azc_traffic_in_1613.rrd AVERAGE -s -2h)
...
1241897400: 1.5591945207e+03 2.2444359957e+03
1241897700: nan nan
1241898000: nan nan
1241898300: 1.6361655405e+03 1.9750202703e+03
1241898600: 1.6696424325e+03 2.1054766135e+03
1241898900: 1.4608537213e+03 2.0426633878e+03
1241899200: 1.5143774415e+03 2.4638730906e+03
1241899500: 1.8531089083e+03 2.4594400432e+03
1241899800: 1.8384700158e+03 2.5726516934e+03
1241900100: 1.7522683405e+03 1.9195866798e+03
1241900400: 1.7933418267e+03 2.7530849702e+03
1241900700: 1.7369725033e+03 2.1307419069e+03
1241901000: 1.7910664136e+03 2.1220945831e+03
1241901300: nan nan
1241901600: nan nan
1241901900: 1.7926040268e+03 1.9684093960e+03
1241902200: nan nan

Started a cronjob which adds static data to a rrd file, to isolate the incident.

comment:5 by richardvm, 16 years ago

When running "ps aux | grep cactiuser" I noticed that there were processes running older then 10 minutes. The hard-beat of all data sources was set to 600 seconds (10 minutes), this could explain why there are gabs in the graphs. If the hard-beat expires then the data will be ignored. I raised the hart-beat to 3600 seconds (1 hour) as a test. I also deleted all the existing graphs because the hard-beat is set on the rrd file creation, so all the existing graphs would otherwise still be subject to the 600 seconds heart-beat.


comment:6 by richardvm, 16 years ago

Seems to be fixed, will keep monitoring it to be sure.

comment:7 by richardvm, 16 years ago

Resolution: fixed
Status: newclosed

Updating the heard-beat to a higher value helped.

Note: See TracTickets for help on using tickets.