You will not find better uptime at a better price, that I am sure. The figures I posted are httpd uptime, and come from checks performed every minute, 24 hours a day.Originally Posted by TonyD
You will not find better uptime at a better price, that I am sure. The figures I posted are httpd uptime, and come from checks performed every minute, 24 hours a day.Originally Posted by TonyD
Matt R.
WHB Chief Ninja
Tony, in Matt's defense, your posts are wicked long and involve too much mathso I don't blame him for just going by the figures he got from the monitoring software.
I also agree that 99.6% is not such a horrible uptime for the price. If you expect 99.9% uptime then you should also expect to pay more. You can't go with one of the cheapest hosts out there and then complain that it's not perfect.
However, Matt, if 99.621% is the figure you got from Nagios, then I think you might want to look into hosting the Nagios cluster at a different datacenter than your main locations -- or at least get one outside server to use for Nagios -- so that it can test the actual remote availability of HTTP.
I say this because Alertra has you at 99.243% uptime for April. That's about 5 hours of downtime.
If your Nagios cluster lives in the same datacenter as the servers it's checking, then it will only show downtime caused by internal server issues. You'll never see downtime caused by things like routing problems or service provider failures, because Nagios never has to leave the datacenter in order to contact the servers. It can just use the internal network. That could be the source of the discrepancy.
This might also explain why in my "Downtime" thread, when I first mentioned the downtime, you said you hadn't even noticed it -- and that was after all 3 WHB sites had been going down over and over again, along with ALL the servers at FortressITX.
Now this Alertra figure is from the Alertra monitoring that's included in the FindMyHosting.com report, and I'm pretty sure what they do is just test your main company URL. However, since it seems Tony's server suffered the same downtime as the main WHB site did in that incident a few days ago, then they must both be hosted at FortressITX, and this Alertra reading should be accurate for BOTH the WHB sites AND reseller6 (as well as every other server at Fortress).
At least, it should be accurate in that the actual downtime for a server at Fortress is AT LEAST what Alertra says it is. Alertra wouldn't pick up on downtime caused by maintenance on one specific server, unless it happened to be the server where the main company site resides. That means that this Alertra figure only represents the downtime caused by that routing disaster. The downtime caused by reboots and maintenance on reseller6 should be ADDED to that figure.
So I apologize for getting into math when I criticized Tony for it, but I'm going to anyway. Matt's figure from Nagios doesn't seem to include downtime caused by the routing failure, and Alertra's doesn't include the maintenance on reseller6. Therefore we should be able to put them together to get the total downtime reseller6 suffered this month, so far. The Nagios figure of 99.6% means 0.4% downtime, and Alertra's 99.25% means 0.75% downtime. The grand total is 1.05% downtime, 98.95% uptime.
That's about 8 hours of downtime by my rough estimate. Still a far cry from 15 hours as Tony claims, but, 8 hours is still a lot.
Hey now my post is wicked long and has a lot of math too. Oh well.
http://fmh.alertra.com/fmhuptime/?id1=125006
Last edited by equazcion; 04-28-2007 at 03:04 PM.
Hi All,
Our Nagios is clustered and I believe those to be accurate results. One thing that may cause the discrepancies is that we monitor at 1 minute intervals. So if a reboot takes 2 minutes, Nagios will record 2 minutes.
I suspect your Alertra monitoring monitors every 5 minutes. So the same reboot, if it occurs when Alertra is scheduled to check, it's going to show much more than the 2 minutes it actually took. I know Alertra can monitor at 1 minute intervals but it's very expensive to do so.
Either way, the downtime this month wasn't welcomed or expected and it was ironic that it had to happen when the two most senior members of staff were travelling, and travelling back from the data center in question. They are usually more stable, and we maintain an excellent relationship with them but we have called in our SLA this month to ensure we keep them on their toes.
We are also moving to a more independant network solution within the coming weeks and we'll announce when we do this. Wayne (our senior admin) will be doing this in conjunction with datacenter personel.
It is our intention to hit 100%, or as close to 100%, uptime each and every month whilst still maintaining current pricing. We have always pioneered the affordability level of hosting and we will continue to do so - at a ground breakin price, and with performance we are all happy about. At the end of the month, we'll have 9-5 phone support and 24x7 live chat support too. I do not know of another host with our pricing that will offer all of this, with the reliable service you are accustomed to.
And lastly (this is possibly the longest forum post I've ever made!), we are going to be launching "business class" hosting probably next quarter. We are looking at a number of ways to implement high availability through software and hardware load balancers to the mass market. Our goal is to have entry level pricing starting at under $10 / month. There was significant interest in this product in the recent poll we did, which reinforced our belief that a product such as this would be extremely popular.
Matt R.
WHB Chief Ninja
Clusters are great, but as I mentioned, if they are located within the same datacenter as the servers they're monitoring, then they are not picking up downtime caused by routing issues.
And it doesn't matter what interval Alertra is using. Alertra is not monitoring your individual servers, it is only monitoring webhostingbuzz.com. You're talking about an inaccuracy due to the rebooting of reseller6, and I'm saying there's no way it would have even known reseller6 was down. It's only showing downtime from the routing issue. The discrepancy is caused by the fact that Alertra and Nagios were each logging downtime from 2 separate issues occuring at 2 separate times. Neither one shows the actual total downtime from both.
I believe the Nagios cluster is intelligent (as we have 2 Nagios servers). I'll check, and if not, we'll definitely set it up this way.
Matt R.
WHB Chief Ninja
I'm sure it's intelligent...
I admit I don't know a lot about what Nagios is capable of, but I'm pretty sure that it's not possible for any software to do what you're hoping it does.
One computer trying to connect to another will always choose the shortest possible route. If both computers are within the same local network, they will end up connecting via the local router. There's just no way around that, at least for software. It would take a good many custom router configurations to somehow force the request to go out onto the internet before coming back to a local server.
This is why people use services like Alertra -- because the only way to monitor remote connectability is by actually connecting from a remote location. Nagios monitors services and tells you if they go down, as that's what it's meant for, but it can't tell you anything about server reachability from outside the datacenter.
There is another way which I've used to do this from my home network, to test my own local web server using its remote address, and that is by using a remote proxy server. If I connect through a remote proxy, then I'm making a request to the proxy, rather than to my local server, so the request actually leaves the local network. This still involves using a remote machine though. I don't think there's any way around that.
If you care, I have a suggestion for you. Since you now have servers at two different locations, Texas and NJ, I would take advantage of that. Set up one Nagios server in Texas, and one in NJ. Have the Texas one monitor the servers in NJ, and have the NJ one monitor the Texas servers. That way you'll always know if anything is inaccessible for any reason whatsoever.
Anyway. I'll shut up now.
Last edited by equazcion; 04-28-2007 at 05:54 PM.
Yeah, that's what we have. Alertra uses Nagios too, just fyi. It is incredibly powerful (and complex).
Matt R.
WHB Chief Ninja
And whenever your proxy is down, downtime is reported as well
Now we know what uptime you think is reasonable, what server load do you think is acceptable?
There are currently 1 users browsing this thread. (0 members and 1 guests)