“The network is the cause of all problems.”
“Adding bandwidth will solve all your problems.”
“The network is still just a dumb pipe.”
“Working in the NOC is not prestigious.”
Collaborating with NetQoS, we’ve scoured the Internet via Twitter, LinkedIn, and the NetworkPerformanceDaily.com blog and asked NetQoS customers, prospects, and others for their top misconceptions and conventional wisdom (CW) about network and application performance.
Here are the top answers gleaned from our informal survey and our own perception of what the current conventional wisdom is. Conventional wisdom is the set of beliefs that guide how we think about Information Technology. For example, the first edition of Performance-First Insights looked at a central piece of conventional wisdom and demonstrated that the IT function does indeed provide business value.
Similar to the challenges associated with discussing the future, discussing conventional wisdom is also fraught with risk. That follows because IT organizations are highly diverse. Hence, it is difficult to say anything that is true for all IT organizations. However, it is both possible and helpful to critique the conventional wisdom to identify if the beliefs that guide how we think about IT are primarily fact or fiction.
Do you have a misconception or instance of conventional wisdom not listed here that you’d like to share? Please e-mail me at performancefirstinsights@netqos.com or post a comment on www.NetworkPerformanceDaily.com.
The CW inside most companies, and even within most IT organizations, is that the cause of application degradation invariably is the network. This piece of CW leads to a new management metric – the mean time to innocence (MTTI). The MTTI is how long it takes for the networking organization to prove it is not the network causing the degradation. Once that task is accomplished, it is common to assume some other component of IT such as the servers must be at fault. This defensive (a.k.a., CYA) approach to troubleshooting elongates the time it takes to resolve application degradation issues. One CIO I spoke with said he is tired of the five guys who report to him coming to him with reports showing that their component of IT is running well, and yet their web site is performing so badly that they are having trouble booking sales orders. He stated “I don’t care where the fault is, I just want them to fix it.”
Conclusion: Certainly there are times in which the network is the reason why the performance of an application is degrading. However, that is the minority of instances, usually less than 30%. Given the increasing importance of applications, a number of IT organizations have adopted an approach to management that focuses on the overall performance of networks and applications, and not just on the availability of the individual components of the IT infrastructure. In 2009 more IT organizations need to focus their management attention on performance and these organizations also need to move away from a CYA approach to troubleshooting that is based on assigning blame and adopt a CIO approach to troubleshooting that is based on fixing the problem.
A common response to application degradation is to throw WAN bandwidth at the problem. The phrase throw bandwidth means that the IT organization is not sure of the cause of the problem and is hoping that adding bandwidth will make the problem go away. Hope is not a strategy.
Conclusion: There is no doubt that there are times when adding bandwidth improves application performance. There is also no doubt that there are times when it does not. For example, assume that a given application requires two hundred round trips to complete a single transaction and that the round trip delay on the WAN is 100ms. Independent of how much bandwidth the IT organization deploys, the transaction time will never be less than twenty seconds.
One impact of throwing WAN bandwidth at a problem is guaranteed to occur: it will increase the monthly recurring cost of the WAN. Given the challenging economic environment, IT organizations need to enhance their ability to identify the source of performance degradation and not just throw bandwidth at the problem and hope that the problem disappears.
For years, articles in the trade press have proclaimed that IT organizations need to align with their company’s business goals. However, few of those articles identified how to accomplish this. One component of aligning the IT organization with the company’s business goals is to create a marketing plan that includes a focus on service management. Part of the goal of service management is to identify both the services that IT offers and the customers of those services. In this context, the customer of a given service is not all of the people who use the service. The customers of the service are a small number of senior managers who are impacted by the service and who are in a position to influence IT budgets and goals.
There are also instances in which some of the groups within the IT organization need to market what it is that they do to the rest of the IT organization. For example, while the NOC used to be responsible primarily for the availability of networks, today it is increasingly common for the NOC to also be responsible for the performance of networks and applications. That fact, however, is not widely understood by the people who work in the IT organization, but who do not work in the NOC. It is difficult to believe that the NOC will be successful if the role of the NOC is not well understood within the IT organization.
Conclusion: Too often this CW is correct. The IT organization in general, and the various IT subgroups in particular, typically do not market the value they provide. For the good of all – the company, the IT organization, the IT professional – this situation needs to change.
Prestige is highly subjective. To be more quantitative, a survey I conducted with NetQoS was given to approximately two hundred IT professionals, roughly half of whom work in a NOC. Fifty percent of the survey respondents who work in a NOC indicated that they believe that working in the NOC is prestigious. In contrast, only twenty-three percent of the survey respondents who work in IT but not in a NOC indicated that working in the NOC is prestigious.
Conclusion: The data indicates that this piece of CW is false for IT in general and true for NOC personnel in particular. However, some marketing can increase the prestige associated with working in the NOC. For example, working in the NOC will be viewed as more prestigious by the rest of the IT organization if the actual role that the NOC plays in managing application performance is better understood.
Another way to both increase the prestige of the NOC and enhance the organization’s ability to manage IT resources is to deploy monitoring and management tools that are suitable for both network engineering and operations. This approach reduces the number of tools, which saves money and improves the communications between disparate groups. Deploying these tools also enables NOC personnel to resolve more problems within their organization and to issue the trouble ticket to the right functional team when they cannot resolve the problem. This approach requires the integration of conventional NOC management products for fault and availability management with performance management tools.
This CW seems to be obvious. The “O” in NOC stands for operations so it seems reasonable that NOC personnel spend all of their time on operational issues. However, last year I authored a research paper (The Next Generation Network Operations Center) based on the survey mentioned above,that discussed the changing role of the network operations center (NOC). That research found that in the majority of IT organizations that NOC personnel are involved in non-operational activities such as network redesign, the selection of new networking technologies and the choice of network service providers.
Conclusion: The afore-mentioned research indicates that this piece of CW is not true. The bigger question going forward is does the management function get involved early in initiatives to evaluate new technologies such as virtualized servers and virtualized desktops?
When IT organizations first deployed networks they were intended to carry email, file transfer and support simple inquiry-response applications. One of the characteristics of email and file transfer traffic is that it is not very delay sensitive. The phrase simple inquiry-response applications refers to applications that send a small amount of information from a server to the user in order to populate the user’s screen. The user then enters a small amount of information (e.g., name, company, billing address) that is transmitted back to the server.
Over the last few years the mix of traffic that transits the typical enterprise network has expanded dramatically. In addition to email, file transfer and simple inquiry-response applications, networks now need to support delay sensitive applications such as VoIP, video conferencing, video surveillance and telepresence as well as the massive file transfers that are associated with data replication. Networks also need to support chatty protocols such as Common Internet File System (CIFS), computationally intense protocols such as Secure Socket Layer (SSL) and new application architectures such as Services Oriented Architecture (SOA) with Web services. In many cases, the network is supposed to determine the identity of the user in order to control what applications the user can access.
Conclusion: There has been a lot of hype recently around phrases such as application aware or application fluent networks. In spite of the hype, there is an import concept here and that concept is that the era of the dumb network is over. As such, this piece of CW is false because successful application delivery requires that networks be designed to include the functionality that is required to support the performance and security requirements of a highly diverse set of applications.
This concept shows up periodically in a variety of contexts. It has recently appeared as part of the justification for server virtualization. In that context, the argument is often made that the CPU utilization of the typical server is roughly 15% and that by implementing server virtualization it is possible to increase the utilization to whatever level you want and hence save money. That argument sounds appealing, particularly in a challenging economic environment. However, there are risks associated with increasing utilization rates. For example, the statement that the CPU utilization rate is 15% usually refers to the average CPU utilization rate. In most instances, the peak utilization rate is notably higher. If, through techniques such as server virtualization, the average CPU utilization is increased to 60%, then for significant periods of time the actual CPU utilization rate will be at 80% or higher. As the CPU utilization rate increases from 15% to 60% to 80% and higher, performance can degrade dramatically.
Conclusion: Low resource utilization is a bad thing if the primary goal is to minimize cost. Conversely, low resource utilization is a good thing if the primary goal is to ensure acceptable performance.
Over the last few years there has been a great growth in the interest that IT organizations have shown in using ITIL as a framework to improve network management processes such as configuration management. The fact that IT organizations want to improve their processes is clearly very positive. However, the current interest in ITIL is somewhat reminiscent of the total quality management (TQM) movement of the early 1990s. Just as it is difficult to argue against the concepts behind TQM, it is difficult to argue against ITIL from a conceptual level. This is, however, yet another example of a situation in which the devil is in the details. For example, people who are skilled in technology typically staff IT organizations. Do IT organizations have the personnel who are skilled in process redesign? Does senior management set realistic goals? How difficult is it for an IT organization to take a generic framework and apply it in their specific environment? For example, one CIO I spoke with said they tried to use ITIL to improve some of their processes. While he does not disagree with the benefits promised by ITIL, he finds ITIL too theoretical and lacks the resources to get deeply involved with it.
Conclusion: ITIL is definitely not a panacea. It can, however, be helpful and will likely become more helpful over time. Like any initiative, the value an IT organization receives from adopting ITIL will depend in large part upon how realistic the IT organization is both in terms of the goals it sets as well as the resources it assigns to process redesign.
Please comment on this issue of Performance-First Insights. E-mail me at performancefirstinsights@netqos.com or post a comment on NetworkPerformanceDaily.com. And follow us on Twitter at http://twitter.com/netqoslive.
CA | NetQoS - Network Performance Management products and services for the world's largest networks. © 2001-2010 CA | NetQoS, Inc. All rights reserved.