Musings on personal and enterprise technology (of potential interest to professional technoids and others)

Showing posts with label monitoring. Show all posts
Showing posts with label monitoring. Show all posts

Friday, September 25, 2009

Measurement + Monitoring of Data Centre Energy Use Immature Through 2011: Gartner

As previously posted, "cheap" servers really aren't cheap at all (Infoworld 5/2008), since recurring server operational costs are often as large or larger than the initial hardware capital investment. Below is a related update from Gartner, reminding us that "you can't manage what you can't measure". Here is Gartner's explanation of the importance of metrics for reducing data-center energy consumption [emphasis mine]:

Gartner Says Measurement and Monitoring of Data Centre Energy Use Will Remain Immature Through 2011:
“...when asked which energy management metrics they will use in the next 18 months, 48 per cent of respondents have not even considered the issue of metrics. However, without metrics it is impossible to get accurate data, which is essential to evaluating basic costs, proportioning these costs to different users and setting policies for improvement.

'These metrics form the bedrock for internal cost and efficiency programmes and will become increasingly important for external use', said Mr Kumar. 'Organisations that want to publicise their carbon usage through green accounting principles will need to have their basic energy use continuously monitored.'

Mr Kumar also urged organisations not to rely on internal metrics saying that evaluating server energy needs to be done in an open and transparent manner...”


hat tip / source: http://twitter.com/Gartner_inc/statuses/4339604501

Tuesday, September 1, 2009

Gmail outage resolved thanks to flexible technical architecture [Official Gmail Blog]

Google's Site Reliability Czar: A clear explanation of today's gmail outage, as quoted below.

IMHO, as painful and widespread as the outage was (a side-effect of "routine upgrades"), urgent and ultimately successful action was taken towards resolution. And of course, the true "secret sauce" that enabled this resolution, is the underlying foundation of the "flexible capacity" which "is one of the advantages of Google's architecture".

Here is the full Official Gmail Blog post:

Official Gmail Blog: More on today's Gmail issue: Tuesday, September 01, 2009 6:59 PM
Posted by Ben Treynor, VP Engineering and Site Reliability Czar

"Gmail's web interface had a widespread outage earlier today, lasting about 100 minutes. We know how many people rely on Gmail for personal and professional communications, and we take it very seriously when there's a problem with the service. Thus, right up front, I'd like to apologize to all of you — today's outage was a Big Deal, and we're treating it as such. We've already thoroughly investigated what happened, and we're currently compiling a list of things we intend to fix or improve as a result of the investigation.

Here's what happened: This morning (Pacific Time) we took a small fraction of Gmail's servers offline to perform routine upgrades. This isn't in itself a problem — we do this all the time, and Gmail's web interface runs in many locations and just sends traffic to other locations when one is offline.

However, as we now know, we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response. At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system 'stop sending us traffic, we're too slow!'. This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn't access Gmail via the web interface because their requests couldn't be routed to a Gmail server. IMAP/POP access and mail processing continued to work normally because these requests don't use the same routers.

The Gmail engineering team was alerted to the failures within seconds (we take monitoring very seriously). After establishing that the core problem was insufficient available capacity, the team brought a LOT of additional request routers online (flexible capacity is one of the advantages of Google's architecture), distributed the traffic across the request routers, and the Gmail web interface came back online.

What's next: We've turned our full attention to helping ensure this kind of event doesn't happen again. Some of the actions are straightforward and are already done — for example, increasing request router capacity well beyond peak demand to provide headroom. Some of the actions are more subtle — for example, we have concluded that request routers don't have sufficient failure isolation (i.e. if there's a problem in one datacenter, it shouldn't affect servers in another datacenter) and do not degrade gracefully (e.g. if many request routers are overloaded simultaneously, they all should just get slower instead of refusing to accept traffic and shifting their load). We'll be hard at work over the next few weeks implementing these and other Gmail reliability improvements — Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity."

Thursday, August 20, 2009

HighlightCam Cuts Long Security Camera Videos Down to the Action

Mashable has a nice writeup today regarding HighlightCam.com :
HOW TO: Quickly Cut Long Videos Down to the Juicy Parts:
"...finding the important parts of these [security camera] videos is painful. That’s because there are hours of footage to sift through. Really, do you want to fastforward through a 12 hour video just to find out what your dog was doing? Yet if you don’t, then you’ve lost the point of having the camera in the first place.

Now a new YCombinator-funded company, HighlightCam, has built the software to take hours of video and compress it into just the minutes with the important stuff – you know, when the dog starts barking, the baby falls out of the crib, or the crook turns on the lights.

So how does HighlightCam pick out the juciest bits of long videos? The web-based software, which has both a free version and an $8.99 per month version, is able to detect movement, light changes, and any variations from the norm. You can pick out how far down the video should be cut – one minute, five minutes, ten minutes, whatever you’d like. You can even get the best parts of YouTube (YouTube) videos with the software, as was demonstrated to us today....

One hour of video, cut down to under a minute, with only the important stuff shown. It’s already caught employees stealing from cash registers, something you’d probably miss if you sifted through the full video. It’s cheap, accessible, usable, and from what we’ve seen, really accurate at pinpointing key events. And with a free version, you can start using it without spending a dime."

Also worth noting as per http://blog.highlightcam.com/ , in fact not only motion but audio is also used as a cue to detect the highlights:

"HighlightCam records your footage all the time, whether there’s motion or not. We then find the highlights using a bunch of different cues, including motion and audio.

A tiny mouse running across the floor, or a loud conversation held off-camera—if that’s the most interesting thing that happened in an hour, that’s what we’ll show you!"



Friday, July 17, 2009

Project Portfolio Management in action: US temporarily halts 45 IT projects (budgeted @$200 million)

As per Demian Entrekin's excellent blog on http://it.toolbox.com/blogs/ppmtoday , the US National CIO Vivek Kundra has embarked on an impressive and BIG IT project portfolio management and reporting initiative ( accessible via http://www.usaspending.gov/ ).

Today (Friday July 17th), a formal announcement on the related blog http://it.usaspending.gov/?q=content/blog confirms that 45 major federal IT projects are actually being temporarily halted. So the major USA federal PPM initiative is not merely a reporting tool, it is actually being used for taking meaningful and consistent action when project overruns are detected:

"Friday, July 17, 2009

Evidence-based decisionsVivek Kundra, Federal CIO

Today, the Department of Veterans Affairs (VA), under the leadership of Secretary Shinseki and VA CIO Roger Baker, announced that it will temporarily halt 45 IT projects which are either behind schedule or over budget and work to determine whether these programs should be continued. We’re not talking about a trivial sum here—the Fiscal Year 2009 combined budget for the 45 projects is approximately $200 million. The worst offender of the bunch was 110% over budget and 17 months behind schedule.

We were able to catch these contracts, in part, thanks to our new tool, the "IT Dashboard” which helped shed light on the performance of projects across the federal government.

During the next few weeks, the VA will audit these 45 projects to determine whether additional resources or new management teams can get them back on schedule.

If they can’t be fixed, the projects will be canceled.If you are just hearing about the IT Dashboard for the first time, it allows you to see which IT projects are working and on-schedule (and which are not), offer alternative approaches, and provide direct feedback to the chief information officers at federal agencies.

Given the size and complexity of the federal IT portfolio, the challenges we face are substantial and persistent. The dashboard is not a substitute for good management. Its value comes from leaders who use the information to make tough, evidence-based decisions on the future of IT investments.

The VA’s announcement is part of a broader effort by the Administration to make the federal government more transparent and to boost accountability and drive better performance. From IT accountability to personnel and contracting reforms, the administration is committed to providing better value, efficiency, and effectiveness for taxpayers’ dollars."

Hat tip: http://twitter.com/nytimesbits/statuses/2690095160

Monday, March 17, 2008

VoIP gets handle on protection - Financial Post 13-Mar-2008

An interesting taxonomy of VOIP security threats (standard view; print-formatted: VoIP gets handle on protection ):

"...VoIPshield Systems Inc., an Ottawa-based software firm that develops products to secure voice communications on IP networks, warns that attacks on VoIP systems can fall into one or more of five categories.


These include privacy intrusion (call eavesdropping, call recording and voice mail tampering); availability (denial of service attacks, buffer overflow attacks and malware); authenticity (registration hijacking, caller ID spoofing and sound insertion; theft (toll fraud/service theft) and data theft through masquerading data as voice and data network crossover attacks; and voice spam, known as SPIT, which includes unsolicited calling, voice mailbox stuffing, and something called vishing (voice phishing)..."

Monday, October 1, 2007

Vendors Tame Virtualization Management Complexity, eWeek 9/07

A fascinating focus on ITIL best practices (and also monitoring) for VMware virtualization. However, the overall ITIL challenge would seem to go way, way beyond just VMware issues [e. g. holistic infrastructure change management, as part of the overall scope of best practices that ought to be enabled via a framework such as HP OpenView, CA Unicenter, IBM Tivoli etc...]

Meanwhile, this planned 4th quarter release by Opalis may be worth watching for (the monitoring tool by eG Innovations is supposedly out already):

Vendors Tame Virtualization Management Complexity: "...automation vendor Opalis Sept. 4 announced its new Opalis Process Catalog for Virtualization, aimed in part at helping IT rein in virtual machine sprawl. 'It's so easy to provision a new virtual machine, they tend to multiply like bunny rabbits. As a result, storage utilization goes through the roof,' said Charles Crouchman, chief technology officer of the Toronto-based company.

The Opalis Process Catalog for Virtualization represents a new chapter in the company's overall electronic IT process catalog book for solving specific problems. It includes automation policies based on ITIL best practices for managing virtual environments. Functions range from straightforward automation of maintenance tasks such as patch management to 'high value processes such as virtual machine life cycle management,' Crouchman said.

The virtualization process catalog includes process flows for both provisioning a new virtual machine and de-provisioning it. 'We can watch the help desk system for a ticket requesting a new VM, sense the ticket has been opened, and we can check that it is in compliance with [rules such as] who can ask for a VM, what kind of VM [is acceptable] and which software you can use on a VM. If it passes compliance, we can immediately provision that VM,' he said.

The product is due in the fourth quarter, although Opalis officials will demonstrate it at VMworld in San Francisco.

Meanwhile, eG Innovations at VMworld will take the wraps off new performance monitoring software that can provide the inside view of how applications are performing within a virtual machine guest.

The Iselin, N.J., company will introduce its new eG Monitor for VMware Infrastructures, which shows in real time the internal and external performance views of what the VMware host sees about guests and what the guests see themselves...."