Community Articles

Application Delivery Around the Globe

Bill Alderson, Technology Consulting Officer, NetQoS, Inc.

*Check out the Application DeliveryHandbook by Jim Metzler

Application owners typically lie awake before a rollout thinking about critical issues of improving workflow, taking care of end users, and meeting management’s reporting needs. Managing these objectives over perhaps years of planning and execution is challenging, but these tasks usually remain under control. During the rollout process, stakeholders maintain insight into the activities of involved people. Related budgets and acceptance testing of outside code might be developed or purchased along the way. Life can be stressful, but manageable.

A few months before rollout, application owners start thinking about delivering their application across the enterprise network. This is when the night sweats begin. It seems as though all their hard work must now be consigned to the great abyss. They can no longer control the budget, the process, the people, or the schedules involved whether an application owner is a control freak or a master diplomat with the skills to woo support from all involved. Because there is no control, a rollout involves real and apparent risks. If the application owner is highly critical or tries to control all aspects of the rollout, these risks may be magnified. Attempts to exercise control are futile. As you will see, even those who think they control the network really do not. This is a good time to make friends with colleagues and coworkers while achieving common corporate goals.

Recently I was engaged by an application owner from a large organization whose network literally traverses the globe. Most of their communications occurred via satellite because of the variety of endpoints and the rugged remote locations their network reaches. Recently, they needed to deploy a critical application across this far-flung infrastructure. The application was spawned in a lab and deployed rapidly into production over a short period. By the time I arrived, this application had been rolled out for a couple of years, but data replication performance was suffering and there would be only more data to replicate in the future. Clearly, something had to be done.

Armed with our “Swiss army knife” of tactical CPR tools (Critical Problem Resolution tools to troubleshoot and analyze complex multi-tier applications), we set out to diagnose why the application was so slow. The application was one of several competing for network resources. It had many inherent challenges, but its deployment across a large network proved to be one of the greatest hurdles it had to overcome.

Between any two primary replicating nodes were no fewer than six different network control entities, each with its own metrics and Service Level Agreements (SLAs). The application owner was associated with none of these network control entities. Although problems were reported to the local control entity, even the local control entity had no control over the end-to-end network through which this application's packets traveled. In fact, the local network entity only knew how to interface with the next entity to which packets were handed off. Sound familiar?

What does one do when the application is affected by the “network” and nobody is accountable from end to end? The typical response, unfortunately, is to pour money all over the problem in a hit-or-miss attempt to solve problems that have never really been accurately diagnosed. Sometimes customers fund large increases in bandwidth in hopes of improving the performance of their application or pay for unwarranted “fork-lift” upgrades of network infrastructure with a vague notion that things will improve.

After spending $600,000 on new server CPUs for their Web and SQL platforms, one such company called us in to diagnose the root cause of its performance problems. In this case, the cause turned out to be a Java applet on the client side with no relation to server performance. Alas, their hefty investment netted a goose egg in performance improvement. It is not uncommon to substitute money for the use of brain cells. We tend to esteem hardware and computer software offerings far beyond the expertise that's available in the marketplace. We have all become jaded when it comes to what we want a consultant to provide, particularly when the consultant's visit so often produces only fancy reports with no real results or changes. In my experience, the more a client tries to dictate methods, the more it interferes with producing the desired outcome. True technical problem diagnosis experts produce definitive results. Focus on results.

Back to the case at hand. This application owner was desperate because of its dependency on a network that had many separate controlling entities. There also appeared to be no way to establish a shared need across all entities to prod them into action to address performance problems. As it happens, the lack of shared consensus and urgency was not even the main issue. While working with the application architects, we questioned how their application buffered its communications to the TCP stack. To no avail, they had tried TCP Window Scaling to overcome high latency across high speed links. Network professionals call these long-fat-pipes to denote their high latency (long) and their high bandwidth (fat). Our thinking was that their inability to successfully employ TCP Window Scaling to improve replication throughput might be caused by their application’s inability to empower TCP by letting go of enough data at the application layer. This hypothesis turned out to be correct. When we changed buffer sizes from the application to the TCP stack, throughput improved by multiple orders of magnitude. Fanfare resulted from our discovery and we demonstrated the resulting performance improvements in the labs.

Taking these improvements to the field yielded results only in the client’s typical network environment and this network was anything but typical. Within one network control entity environment, improvements were marked; however, across many ownership domains, packet loss and dynamic variations in latency continued to limit our throughput gains. Each network control entity had its own control center measuring its own SLA within its area of control, but, from an application perspective, end-to-end packet loss and latency jitter were much higher than reasonable.

We began the arduous job of finding root causes between two or three of the six or more entities. Packet loss analysis was first and involved tracing the path between the replication nodes across the network down to specific router, WAN, satellite, and switch ports from end to end. Issues were found at and between the demark points of the control entities; that is, in switches, routers, and satellite link demarks. Does falling between the cracks come to mind? This explains how each entity was able to achieve high SLA compliance, yet still experience performance issues--these problems were at or between technical demarcations. Until an overarching entity or initiatives are developed to uncover end-to-end impacts on applications, it is unlikely that application owners can stop being squeaky wheels.

To whom do they squeak? To executive management? The position of CIO was created to bridge the gap. As CIOs begin to solve this problem systemically as a professional group rather than haphazardly or on an individual basis within their own organizations, the future looks brighter. IT has grown faster than any previous advancement mankind has developed and its inability thus far to adequately manage such growth shows in poor performance management end to end. Telecommunications, by contrast, evolved over more than a century and adapted and scaled as people used it. The technical complexities of IT necessitate compartmentalization. Compartmentalization by definition creates hindrances to understanding and thus managing the whole. We have few technologists who possess cross-compartmental technical, process, and management understanding. Such understanding requires skill and persuasive communication plus organizational leadership to solve the end-to-end performance problems that application owners experience today.

We manage nuclear technology using compartmentalization effectively for complexity and security reasons. Oppenheimer and others took great care to set up nuclear technology management to manage the whole. Fortunately, IT is not as dangerous though it is as potentially productive as nuclear technology, but It has not had architecting and managing at the top from its inception as nuclear technology did.

The organization affected by the lack of end-to-end performance management has begun a concerted effort to provide their application owners with an end-to-end analysis of their application issues. It took expensive problems and bright leaders listening to the experts they called upon to initiate change.

Compartmentalized IT paradigms continue to challenge application performance management. Process initiatives such as ITIL offer hope, particularly when technical experts participate in meaningful ways. As author Steven Covey says, it’s time to “take a helicopter ride” to gain new perspectives on application performance management. Think end-to-end responsibility!


sitemap | legal | request info | contact

 

NetQoS - Network Performance Management Products and Services for the world's largest networks. © 2001-2008 NetQoS, Inc. All rights reserved.

 

 

 

Products: NetQoS Performance Center - Network Monitoring | NetQoS SuperAgent - Service Level Reporting | NetQoS ReporterAnalyzer - Network Traffic Analyzer | NetQoS NetVoyant - SNMP Polling | NetQoS VoIP Monitor - VoIP Performance Monitoring | NetQoS GigaStor - Network Analysis | NetQoS Allocate - IT Cost Accounting

IT Solutions: VoIP Performance | MPLS Management | WAN Troubleshooting | Network Capacity Planning | Service Level Reporting | Network Management | WAN Optimization | NetFlow | Application Delivery | Bandwidth Utilization | Cisco WAAS | Cisco NetFlow | NetFlow Monitoring | Network Management Software | SNMP Polling | Application Performance Monitor | Network Monitoring Software | Network Performance Software | Network Behavior Analysis | NetFlow Analyzer

Resource Room: Network Performance Monitoring Whitepapers | Case Studies | Data Sheets | Networking Webinars | Networking Podcasts | Industry Initiatives | Network Performance | Network Management News | Network Performance Management Articles | Network Tools

Services: NetQoS Product Implementation | NetAnalyst Training | Network Consulting Services | VoIP Readiness | Network Certification Training