Saturday, April 02, 2011

Trace Windows 7 boot issues

Amplify’d from www.msfn.org

Trace Windows 7 boot/shutdown/hibernate/standby/resume issues

Read more at www.msfn.org
To get started you need the Windows Performance Tools Kit. Read here how to install it:



http://www.msfn.org/...howtopic=146919



Now open a command prompt with admin rights and run the following commands:



For boot tracing:

xbootmgr -trace boot -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:TEMP




For shutdown tracing:

xbootmgr -trace shutdown -noPrepReboot -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:TEMP




For Standby+Resume:

xbootmgr -trace standby -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:TEMP




For Hibernate+Resume:

xbootmgr -trace hibernate -traceFlags BASE+CSWITCH+DRIVERS+POWER -resultPath C:TEMP




replace C:TEMP with any temp directory on your machine as necessary to store the output files



All of these will shutdown, hibernate, or standby your box, and then reboot to finish tracing. Once Vista/Server 2008(R2) or Windows 7 does reboot, log back in as necessary and once the countdown timer finishes, you should now have some tracing files in C:TEMP. If asked, upload or provide the file(s) generated in C:TEMP (or the directory you chose) on a download share for analysis.



Analyses of the boot trace:



To start create a summary xml file, run this command (replace the name with the name of your etl file)



xperf /tti -i boot_BASE+CSWITCH+DRIVERS+POWER_1.etl -o summary_boot.xml -a boot




Now you see this picture.:



Resized to 74% (was 1175 x 394) - Click image to enlargePosted Image




You have too look at the timing node. All time values are in ms.



The value timing bootDoneViaExplorer shows the time, Windows needs to boot to the desktop.



The value bootDoneViaPostBoot is the time (+10s idle detection) which Windows needs to boot completly after finishing all startup applications.



Quote

During the OSLoader phase (shown in the value osLoaderDuration), the Windows loader binary (Winload.exe) loads essential system drivers that are required to read minimal data from the disk and initializes the system to the point where the Windows kernel can begin execution. When the kernel starts to run, the loader loads into memory the system registry hive and additional drivers that are marked as BOOT_START.



Visual Cues



This phase begins approximately when the BIOS splash and diagnostic screens are cleared and ends approximately when the “Loading Windows” splash screen appears.




those values show you a summary.



The MainPathBoot Phase



Quote

What Happens in This Phase

During the MainPathBoot phase, most of the operating system work occurs. This phase involves kernel initialization, Plug and Play activity, service start, logon, and Explorer (desktop) initialization. To simplify analysis, we divide the MainPathBoot phase into four subphases, as show in the next picture. Each subphase has unique characteristics and performance vulnerabilities.



Visual Cues



Visually, the MainPathBoot phase begins when the “Starting Windows” splash screen appears and lasts until the desktop appears. If auto-logon is not enabled, the time that elapses while the logon screen is displayed affects the measured boot time in a trace.




Resized to 80% (was 1092 x 341) - Click image to enlargePosted Image




PreSMSS Subphase

Quote

What Happens in This Subphase

The PreSMSS subphase begins when the kernel is invoked. During this subphase, the kernel initializes data structures and components. It also starts the PnP manager, which initializes the BOOT_START drivers that were loaded during the OSLoader phase. When the PnP manager detects a device, it loads and initializes the device’s drivers



Visual Cues

PreSMSS begins approximately when the “Loading Windows” splash screen appears. There are no explicit visual cues for the end of PreSMSS.




So if the time takes too long for you, look inside the <PNP> node which driver is loading too slowly.



SMSSInit Subphase

Quote


What Happens in This Subphase

The SMSSInit subphase begins when the kernel passes control to the session manager process (Smss.exe). During this subphase, the system initializes the registry, loads and starts the devices and drivers that are not marked BOOT_START, and starts the subsystem processes. SMSSInit ends when control is passed to Winlogon.exe.



Visual Cues

There are no explicit visual cues for the start of SMSSInit, but the blank screen that appears between the splash screen and the logon screen is part of SMSSInit. It ends before the logon screen appears.



SMSSInit Performance Vulnerabilities


Video drivers are a common source of performance problems in the SMSSInit subphase. The video driver must be initialized first in the system session and then in the user session. Reduction of video driver initialization time leads to a direct wall-clock reduction in boot time.

Initialization in the user session is typically much faster than in the system session because Windows performs common initialization tasks during the system session.




So if the SMSSInit Phase takes too long, try to get an graphic card driver update.



WinLogonInit Subphase



Quote


What Happens in This Subphase

The WinLogonInit subphase begins when SMSSInit completes and starts Winlogon.exe. During WinLogonInit, the user logon screen appears, the service control manager starts services, and Group Policy scripts run. WinLogonInit ends when the Explorer process starts.



Visual Cues

WinLogonInit begins shortly before the logon screen appears. It ends just before the desktop appears for the first time.



WinLogonInit Performance Vulnerabilities

Many operations occur in parallel during WinLogonInit. On many systems, this subphase is CPU bound and has large I/O demands. Good citizenship from the services that start in this phase is critical for optimized boot times.

Services can declare dependencies or use load order groups to ensure that they start in a specific order. Windows processes load order groups in serial order. Service initialization delays in an early load order group block subsequent load order groups and can possibly block the boot process .




If you have too long WinLogonInit Time, open the etl file and scroll to the service graph and look for a long delay.



Posted Image



In this example the service SavService (Sophos Anti-VirusSavService.exe) is part of the Plug and Play group and causes a delay because the service takes too long to start. Try to get an update for the hanging service or remove the software.



ExplorerInit Subphase



Quote


What Happens in This Subphase



The ExplorerInit subphase begins when Explorer.exe starts. During ExplorerInit, the system creates the desktop window manager (DWM) process, which initializes the desktop and displays it for the first time.

This phase is CPU intensive
. The initialization of DWM and desktop occurs in the foreground, while in the background the service control manager (SCM) starts services and the memory manager prefetches code and data. On most systems ExplorerInit is CPU bound, and timing issues are likely the result of a simple resource bottleneck.

Visual Cues

ExplorerInit begins just before the desktop appears for the first time. There is no clear visual cue to indicate the end of ExplorerInit.



ExplorerInit Performance Analysis



Applications—such as antivirus programs or application servers—that are created during service start in this or previous phases can consume CPU resources during ExplorerInit. Some services might not be started yet when ExplorerInit is complete.




So if the ExplorerInit phase takes too long, make sure you minimize the services which use a lot of CPU power and make sure your AV Tool doesn't hurt too much. If it doesn't change the tool and try a different.



The PostBoot Phase



Quote


What Happens in This Phase

The PostBoot phase includes all background activity that occurs after the desktop is ready. The user can interact with the desktop, but the system might still be starting services, tray icons, and application code in the background.

Specifically, Xperf samples the system every 100 ms during the PostBoot phase. If the system is 80-percent or more idle (excluding low-priority CPU and disk activity) at the time of the sample, Xperf considers the system to be “idle” for that 100 ms interval. The phase persists until the system accumulates 10 seconds of idle time.

Note: When you review traces and report timing results, you should subtract the 10 second idle time that accumulated during PostBoot to determine total boot time.




Visual Cues

There are no explicit visual cues for PostBoot. The phase begins after the user’s desktop appears and ends after satisfying the 10-second metric that was explained earlier.



PostBoot Performance Vulnerabilities

During PostBoot, Windows examines the entries in the various Run and RunOnce keys (Run, RunOnce, RunOnceEx, RunServices, and so on) in the registry and the Startup folder in the file system, and then starts the listed applications.




If post boot takes too long, reduce the number of running applications at startup with the help of msconfig.exe or AutoRuns.



When you have a HDD (no SSD!) and you want to speedup the boot, run the optimization from this guide:



http://www.msfn.org/...howtopic=140262




Analyses of the shutdown trace:



The shutdown is divided into this 3 parts:



Resized to 85% (was 1025 x 179) - Click image to enlargePosted Image




To generate an XML summary of shutdown, use the -a shutdown action with Xperf:



xperf /tti -i shutdown_BASE+CSWITCH+DRIVERS+POWER_1.etl -o summary_shutdown.xml -a shutdown




Open the XML and you see this:



Posted Image



It shows you the most relevant data.



<timing shutdownTime="23184" servicesShutdownDuration="1513">




The shutdownTime is in this example 23s. Stopping the services takes 1.5s which is fast.



Next you have an entry for all sessions. Starting with Vista, all services run in Session 0 (Session 0 Isolation) and each user gets his one Session (1,2,..,n).



sessionShutdown sessionID="1" duration="3321">




shows the time which it takes to stop all applications which the user is running. In this example it takes 3.3seconds.



UserSession Phase



Quote


What Happens in UserSession

During this phase, the Client/Server Runtime Server Subsystem (Csrss.exe) shuts down all applications that are running in the user session—that is, all applications that have session ID 1.



If after 5 seconds any application blocks shut down, Windows displays the dialog box in Figure 24 so that users can choose to force or cancel shutdown.



Posted Image



UserSession Performance Vulnerabilities



Because Windows serially shuts down applications, any delay in a process’s shutdown path contributes to the total shutdown duration. To ensure a speedy shutdown, every application must respond quickly to shutdown notification messages (WM_QUERYENDSESSION and WM_ENDSESSION).

Windows uses long time-outs so that applications have sufficient time to shut down and save user data. Therefore, applications can have a significant effect on shutdown performance.









sessionShutdown sessionID="0" duration="1513">




The value sessionShutdown sessionID="0" shows the servicesShutdownDuration. So you can see which service takes too long to stop.



SystemSession Phase



Quote


What Happens in SystemSession



This phase includes two subphases:

• Preshutdown notification. Windows serially shuts down all services that registered to receive preshutdown notifications. Ordered services—services that have set up the shutdown order of dependent services—are shut down before non-ordered services.

• Shutdown notification. All services that registered to receive shutdown notifications are shut down in parallel.



If all services have not exited after 20 seconds (in Windows Vista) or 12 seconds (in Windows 7), the system continues the shutdown. Processes and services that do not shut down in a timely manner are left running as the system shuts down.



SystemSession Performance Vulnerabilities



In the preshutdown notification subphase, the SCM serializes the waits. Therefore, these services block system shutdown until they exit or until the wait hint time-out expires.

Services are not guaranteed to have enough time to finish all their work in the shutdown notification subphase before the system shuts down.







In both cases expand the node and look at the shutdownDuration value.



It helps you to identify a hanging application are service.



KernelShutdown Phase



Quote


What Happens in KernelShutdown

In the KernelShutdown phase, the rest of the system, including all devices and drivers, is shut down.




To calculate the time spent in KernelShutdown, subtract the time that is required to shut down the system and user sessions from shutdownTime.



In my example:



KernelShutdown = 23184 - 3321 - 1513 = 18350



In this case the 18.35 seconds are very slow. In the <interval> you see an entry ZeroHiberFile which takes too long. In this expample the user enabled the Option ClearPageFileAtShutdown under HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlSession ManagerMemory Management to 1. This overrides the hiberbation file with 0 to delete personal data. This causes the huge slowdown. Setting this option to 0 would save 12.64 seconds of shutdown time.







That is all you need to analyze slow shutdown issues.
Read more at www.msfn.org
 

No comments: