<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Demand Technology FAQ &#187; Processor</title>
	<atom:link href="http://faq.demandtech.com/tag/processor/feed/" rel="self" type="application/rss+xml" />
	<link>http://faq.demandtech.com</link>
	<description>Help and Support for the Performance Sentry Product Line</description>
	<lastBuildDate>Wed, 30 Jun 2010 19:33:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How should I report on processor utilization for machines running Intel Hyper-threading (HT) technology?</title>
		<link>http://faq.demandtech.com/2009/09/29/how-should-i-report-on-processor-utilization-for-machines-running-intel-hyper-threading-ht-technology/</link>
		<comments>http://faq.demandtech.com/2009/09/29/how-should-i-report-on-processor-utilization-for-machines-running-intel-hyper-threading-ht-technology/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 15:21:38 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[CPU - Processor]]></category>
		<category><![CDATA[Windows Performance]]></category>
		<category><![CDATA[Processor]]></category>

		<guid isPermaLink="false">http://faq.demandtech.com/?p=15</guid>
		<description><![CDATA[Hyper-threading (HT) is the brand name for the technology Intel uses in many of its Xeon 32-bit processors that enables one physical processor core to execute two instruction streams (or threads) concurrently. On an HT machine, when HT is enabled, each physical processor currently presents two &#8220;logical&#8221; CPU interfaces to the operating system so that [...]]]></description>
			<content:encoded><![CDATA[<p>Hyper-threading (HT) is the brand name for the technology Intel uses in many of its Xeon 32-bit processors that enables one physical processor core to execute two instruction streams (or threads) concurrently. On an HT machine, when HT is enabled, each physical processor currently presents two &#8220;logical&#8221; CPU interfaces to the operating system so that two program threads can be dispatched at a time. The best way to report on processor utilization for an HT machine is to calculate the average utilization of the logical processors associated with the same physical processor core.</p>
<p>Figuring out whether or not HT is beneficial or detrimental on a specific workload is difficult today unless you can do an apples-to-apples comparison between an HT machine and a non-HT machine running the exact same workload. On an HT machine, all the processor level resource usage measurements such as % Processor Time represent utilization of a logical processor. Some authorities recommend averaging the processor utilization of the two logical processors that share a physical processor core to calculate utilization of the physical processor. To do this, you must understand which logical processors are associated with the same physical processor core.</p>
<p>Two new Processor configuration records, introduced in Performance Sentry version 2.4.7, allow you to identify HT machines definitively and determine which logical processors share a physical processor core. An instance of a DTS.CPU configuration record that identifies a physical processor is written for each physical processor that is present. These records contain a counter called <strong># Logical Processors Supported </strong>that will tell you if it is an HT machine, along with a counter called  <strong># Logical Processors Active </strong>that shows you if HT is enabled. If the <strong># Logical Processors Supported</strong> counter contains a null value, then the machine is not HT-capable. If the <strong># Logical Processors Supported</strong> counter contains valid numeric data, then it is an HT-capable machine. (You should see a numeric value of 2 for current HT-ready processors. Note that Intel&#8217;s processor roadmap shows them contemplating building HT machines with more than 2 logical processors sometime in the future.) You can also tell if HT is enabled on the machine. On an HT machine, if the <strong># Logical Processors Active</strong> is less than <strong># Logical Processors Supported</strong>, then the HT support has been disabled.</p>
<p>The DTS.CPU records contain some additional CPU hardware configuration data that you might find interesting, like the amount of L1, L2 and L3 cache memory is installed, where that information is available.</p>
<p>DTS.LogicalProcessor records are also written that associate a logical processor instance name (the same instance name used in the Processor records) with a DTS.CPU physical processor core parent instance. Both sets of Processor configuration records are automatically written once to the beginning of each NTSMF data file, just before the first interval data records.</p>
<p>The core technology that Intel uses in its HT machines is known as Simultaneous Multithreading (SMT), which you can learn about at this <a href="http://www.cs.washington.edu/research/smt/">University of Washington, Computer Science department web site</a>. Much of the research published here shows SMT to be quite promising. Multiple threads executing simultaneously on the same processor core works well when an instruction from one thread blocks inside the instruction pipeline, but the processor can continue to make forward progress executing instructions from another thread. On the other hand, in practice HT is sometimes detrimental to overall performance when it comes to real-world workloads, forcing customers to disable HT in some instances. Threads executing concurrently on the same processor core must contend for shared resources inside the processor, particularly the same processor cache. Multiple threads can also interfere with each other&#8217;s instruction execution progress, which leads to degraded performance levels. One suggestion is that this interference is more likely to occur when you are attempting to run an homogenous workload and less likely to occur when the processor is executing threads from unrelated processes. In other words, on a machine dedicated to a specific application or one instance of SQL Server, HT could do more harm than good.</p>
<p>According to this <a href="http://download.microsoft.com/download/5/7/7/577a5684-8a83-43ae-9272-ff260a9c20e2/Hyper-thread_Windows.doc">white paper posted on Microsoft &#8217;s web site</a>, logical processor instance names on an HT machine are generated in sequence, one to a physical processor until all the physical processors have one logical processor, and then in sequence again until all the HT logical processors have been accounted for. For example, on an HT-enabled machine with 4 processor cores, processor instances 0 and 4 are associated with the first physical processor present, instances 1 and 5 are associated with the next, etc. Since the assignment of logical processor numbers to physical processor cores is a BIOS function, the authors of the Microsoft white paper were not entirely certain that every HT machine you ever come across will look this way, but at least every one that they have seen so far conforms to this numbering scheme.</p>
]]></content:encoded>
			<wfw:commentRss>http://faq.demandtech.com/2009/09/29/how-should-i-report-on-processor-utilization-for-machines-running-intel-hyper-threading-ht-technology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is the most reliable indicator of CPU contention?</title>
		<link>http://faq.demandtech.com/2009/09/29/what-is-the-most-reliable-indicator-of-cpu-contention/</link>
		<comments>http://faq.demandtech.com/2009/09/29/what-is-the-most-reliable-indicator-of-cpu-contention/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 15:12:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[CPU - Processor]]></category>
		<category><![CDATA[Windows Performance]]></category>
		<category><![CDATA[Processor]]></category>

		<guid isPermaLink="false">http://faq.demandtech.com/?p=10</guid>
		<description><![CDATA[Look at a combination of processor utilization and processor queuing.
(1) The primary indicator of processor utilization is the % Processor Time Counter in the Processor Object. Note that the _Total instance of the Processor Object is actually the average value over all processors. The System Object in NT 4.0 contains a Counter named % Total [...]]]></description>
			<content:encoded><![CDATA[<p>Look at a <em>combination</em> of processor utilization <em>and</em> processor queuing.</p>
<p>(1) The primary indicator of processor utilization is the <strong>% Processor Time</strong> Counter in the Processor Object. Note that the <strong>_Total</strong> instance of the Processor Object is actually the <em>average</em> value over all processors. The System Object in NT 4.0 contains a Counter named <strong>% Total Processor Time</strong>, which is also the <em>average</em> value over all processors.</p>
<p>The thread is the unit of execution in Windows. Each process address space that is launched has at least one thread, and many applications, of course, are multithreaded. There is an operating system function in Windows that keeps track of how CPU time each thread consumes using a sampling technique. Samples are normally taken approximately one or two hundred times per second, which suggests that this technique is probably accurate for measurement intervals of 30 seconds or more. These samples are used to maintain the Thread <strong>% Processor Time </strong>Counter, once execution time recorded in 100 nanosecond timer tick units, is normalized to a percentage of the measurement interval duration. Thread <strong>% Processor Time</strong> is also summarized at the Process level. The table below summarizes this overall measurement scheme:</p>
<p><!--mstheme--></p>
<table id="AutoNumber1" style="border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0" width="88%">
<tbody>
<tr>
<td width="23%" align="center"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"></p>
<h4><!--mstheme--><span style="font-family: Book Antiqua,Times New Roman,Times;">Object<!--mstheme--></span></h4>
<p><!--mstheme--></p>
<p></span></td>
<td width="28%" align="center"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"></p>
<h4><!--mstheme--><span style="font-family: Book Antiqua,Times New Roman,Times;">Counter<!--mstheme--></span></h4>
<p><!--mstheme--></p>
<p></span></td>
<td width="49%" align="center"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"></p>
<h4><!--mstheme--><span style="font-family: Book Antiqua,Times New Roman,Times;">Derivation<!--mstheme--></span></h4>
<p><!--mstheme--></p>
<p></span></td>
</tr>
<tr>
<td width="23%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><strong>Thread</strong><!--mstheme--></span></td>
<td width="28%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;">% Processor Time<!--mstheme--></span></td>
<td width="49%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;">Dispatcher timing mechanism<!--mstheme--></span></td>
</tr>
<tr>
<td width="23%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><strong>Process</strong><!--mstheme--></span></td>
<td width="28%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;">% Processor Time<!--mstheme--></span></td>
<td width="49%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><span style="font-family: Symbol;">S</span><strong>Thread</strong> % Processor Time <!--mstheme--></span></td>
</tr>
<tr>
<td width="23%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><strong>Processor</strong><!--mstheme--></span></td>
<td width="28%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;">% Processor Time<!--mstheme--></span></td>
<td width="49%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;">100% &#8211; Idle Thread % Processor Time<!--mstheme--></span></td>
</tr>
<tr>
<td width="23%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><strong>Processor </strong>(Win2K, XP)<!--mstheme--></span></td>
<td width="28%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><strong>Processor</strong>(_Total) % Processor Time<!--mstheme--></span></td>
<td width="49%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><span style="font-family: Symbol;">S</span><strong>Processor</strong> % Processor Time / # processors<!--mstheme--></span></td>
</tr>
<tr>
<td width="23%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><strong>System </strong>(NT)<!--mstheme--></span></td>
<td width="28%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;">Total % Processor Time<!--mstheme--></span></td>
<td width="49%"><!--mstheme--><span style="font-family: book antiqua,times new roman,times;"><span style="font-family: Symbol;">S</span><strong>Processor</strong> % Processor Time / # processors<!--mstheme--></span></td>
</tr>
</tbody>
</table>
<p><!--mstheme-->Processor busy is measured using an Idle thread mechanism. The operating system dispatches an Idle thread whenever there are no ready threads to run. Whenever the processor accounting routine finds the Idle thread dispatched, processor time is accumulated for the Idle thread. By the way, the Idle thread is not an actual execution thread, it is a HAL function which fulfills this essentially bookkeeping function.</p>
<p>At the end of a measurement interval, <strong>% Processor Time</strong> is calculated by subtracting the amount of accumulated Idle thread time from 100%. On a multiprocessor, there is a dedicated Idle thread per processor so that reliable measurements are kept. % Processor Time at the processor level can be broken down into Privileged mode execution time, User mode execution time, execution time in Interrupt mode, and execution time in Deferred Procedure Calls (DPC), as illustrated in Figure 1 below. <strong>% Interrupt Time </strong>and <strong>% DPC Time</strong> are both subsets of <strong>% Privileged Time</strong>. <img src="http://www.demandtech.com/FAQsCPU_files/image002.jpg" border="0" alt="" width="576" height="432" /> <strong>Figure 1.</strong> <em>Processor utilization breakdown.</em></p>
<p><strong>% Processor Time</strong> approaches 100% as an absolute upper limit on CPU capacity. A system that is running consistently at greater than 90% busy is clearly out of capacity. However, this is a not a hard and fast rule. Some workloads show signs of significant CPU contention at lower levels of processor utilization. For example, Figure 1 shows an IIS web server at a large e-commerce site where processor utilization remains consistently below 85%, except for two peak processing intervals. However, we will see that this system suffers from a serious CPU capacity constraint. So start with <strong>% Processor Time</strong>, but do not stop there.</p>
<p>(2) The System Object contains an instantaneous Counter called <a title="OLE_LINK1" name="OLE_LINK1"></a><strong>Processor Queue Length</strong>. This Counter shows the number of threads that are currently in the Ready state, but are delayed waiting for a processor to be available. Maintaining a value of no more than five Ready threads <em>per processor </em>is the usual recommendation. More than ten Ready threads per processor normally indicates a CPU resource shortage.</p>
<p>The <strong>Processor Queue Length</strong> Counter is often well-correlated with <strong>% Processor Time</strong>, even though the former is an instantaneous value obtained at the time the last processor sample was collected, while the latter is based on continuous samples during the measurement interval. Figure 2 shows the overall processor utilization from the 4-way multiprocessor system in Figure 1<strong> </strong>with an overlay of the <strong>Processor Queue Length</strong> for the same interval (charted against the right hand y-axis). The expected correlation between processor utilization and the number of Ready and Waiting threads is apparent.</p>
<p><img src="http://www.demandtech.com/FAQsCPU_files/image004.jpg" border="0" alt="" width="576" height="360" /> <strong>Figure 2.</strong> <em>% Processor time vs. the Processor Queue Length.</em></p>
<p>In this instance, we also saw a correlation between periods of poor Active Server Pages (ASP) response time and corresponding spikes in the size of the Processor Ready Queue. In this specific instance, <strong>% Processor Time</strong> values consistently greater than 70% appeared to cause spikes in web site response time.</p>
<p>(3) At the Thread level, there is a Counter called <strong>Thread State</strong>. Threads waiting in the Ready Queue have a Thread State code of 1 (see the Explain text for the Counter). This Counter tells you precisely which threads are waiting for service at the processor. Since Windows uses priority queuing to order the Ready Queue, knowing which threads are delayed in the queue can be quite useful to help pinpoint the impact of CPU contention on specific applications that might be experiencing performance problems.</p>
<p>Because of the quantity of threads on a typical NT machine, collecting thread execution state data using tools like System Monitor is normally prohibitive. Consequently, we designed Performance Sentry to allow efficient collection of this potentially useful information. The <strong>Ready Threads</strong> Counter that Performance Sentry provides at the process level shows the number of Ready and Waiting threads for that process at the end of each measurement interval.</p>
]]></content:encoded>
			<wfw:commentRss>http://faq.demandtech.com/2009/09/29/what-is-the-most-reliable-indicator-of-cpu-contention/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
