High Performance SQL Query Development on MSSQL Server 2008 R2



High Performance SQL Query Development on MSSQL Server 2008 R2



Basic Steps in Query Processing

1) The scanning, parsing, and validating module produces an internal representation of the query.

2) The query optimizer module devises an execution plan which is the execution strategy to retrieve the result of the query from the database files. A query typically has many possible execution strategies differing in performance, and the process of choosing a reasonably efficient one is known as query optimization. Query optimization is beyond this course

3) The code generator generates the code to execute the plan.

4) The runtime database processor runs the generated code to produce the query result.











What Happens When a Query is Submitted?
The process is split into 2 (roughly)
  1. Processes that occur in the relational engine
  1. Processes that occur in the storage engine.

In the relational engine the query is parsed and then processed by the Query Optimizer, which generates an execution plan. The plan is sent (in a binary format) to the storage engine, which it then uses to retrieve or update the underlying data. The storage engine is where processes such as locking, index maintenance and transactions occur. Since execution plans are created in the relational engine, that's where we'll be focusing our attention.



Query Parsing

Step 1 : User input Query
Step 2 : Query passed to the Parser
Parser Check is the SQL query is written correctly and verify if the query is well formed.
This process is known as query parsing. The output of the Parser process is a parse tree, or query
tree (or even sequence tree).

Step 3 : Query parser outputs a parse tree.

Step 4 : The parse tree is passed to a process called the algebrizer.

The algebrizer resolves all the names of the various objects, tables and columns, referred to within the query string. The algebrizer identifies, at the individual column level, all the types (varchar(50) versus nvarchar(25) and so on) of the objects being accessed. It also determines the location of aggregates (such as GROUP BY, and MAX) within the query, a process called aggregate binding.

This algebrizer process is important because the query may have aliases or synonyms, names that don't exist in the database, that need to be resolved, or the query may refer to objects not in the database.

Step 5 : The algebrizer outputs a binary called the query processor tree, which is then passed on to the query optimizer.

Note : Only DML statements could be Optimized.


The Query Optimizer

It uses the query processor tree and the statistics it has about the data, and applying the model, it works out what it thinks will be the optimal way to execute the query – that is, it generates an execution plan.

In other words, the optimizer figures out how best to implement the request represented by the T-SQL query you submitted. It decides if the data can be accessed through indexes, what types of joins to use and much more.

The decisions made by the optimizer are based on what it calculates to be the cost of a given execution plan, in terms of the required CPU processing and I/O, and how fast it will execute. Hence, this is known as a cost-based plan.

The optimizer will generate and evaluate many plans (unless there is already a cached plan) and, generally speaking, will choose the lowestcost plan i.e. the plan it thinks will execute the query as fast
as possible and use the least amount of resources, CPU and I/O.

For making the Execution plans the statistics are used, Statistics are collected on columns and indexes within the database, and describe the data distribution and the uniqueness, or selectivity, of the data.

If statistics exist for a relevant column or index, then the optimizer will use them in its calculations. Statistics, by default, are created and updated automatically within the system for all indexes or for any
column used as a predicate, as part of a WHERE clause or JOIN ON clause.

The optimizer takes the statistics, along with the query processor tree,
and heuristically determines the best plan.

Once the optimizer arrives at an execution plan, the actual plan is created and stored in a memory space known as the plan cache.

Query Execution

Once the execution plan is generated, the action switches to the storage engine, where the query is actually executed, according to the plan.

Estimated and Actual Execution Plans

  • The plan created by the Query Optimizer is the Estimated Execution Plan.
  • Actual Execution Plan shows what actually happened when the query executed.

When a query is submitted to the server, an estimated execution plan is created by the optimizer. Once that plan is created, and before it gets passed to the storage engine, the optimizer compares this estimated plan to actual execution plans that already exist in the plan cache. If an actual plan is found that matches the estimated one, then the optimizer will reuse the existing plan, since it's already been used before by the query engine. This reuse avoids the overhead of creating actual execution plans for large and complex queries or even simple plans for small queries called thousands of times in a minute.

Execution plans are not sacrosanct. Certain events and actions can cause a plan to be recompiled. It is important to remember this because recompiling execution plans can be a very expensive operation.

To completely clear the cache, run this:

 



To see the execution plans on our system : -

SELECT
[cp].[refcounts], [cp].[usecounts], [cp].[objtype], [st].[dbid], [st].[objectid], [st].[text], [qp].[query_plan]
FROM
sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp ;



When Parallelism is Requested

When a plan meets the threshold for parallelism two plans are created. Which plan is actually executed is up to the query engine.
So you might see a plan with, or without, parallel operators in the estimated execution plan. When the query actually executes, you may see a completely different plan if the query engine determines that it either can't support a parallel query at that time or that a parallel query is called for.

Execution Plan Formats

• Graphical Plans
• Text Plans
• XML Plans

No comments

Thanks for viewing the blog post. Follow the blog to receive the updates.. Powered by Blogger.