Database
database performance
High Performance SQL Query Development on MSSQL Server 2008 R2
High Performance SQL Query Development on MSSQL Server 2008 R2
Basic
Steps in Query Processing
1)
The scanning, parsing, and validating module produces an internal
representation of the query.
2)
The query optimizer module devises an execution plan which is the
execution strategy to retrieve the result of the query from the
database files. A query typically has many possible execution
strategies differing in performance, and the process of choosing a
reasonably efficient one is known as query optimization. Query
optimization is beyond this course
3)
The code generator generates the code to execute the plan.
4)
The runtime database processor runs the generated code to produce the
query result.
What
Happens When a Query is Submitted?
The
process is split into 2 (roughly)
- Processes that occur in the relational engine
- Processes that occur in the storage engine.
In
the relational engine the query is parsed and then processed by the
Query Optimizer, which generates an execution plan. The plan is sent
(in a binary format) to the storage engine, which it then uses to
retrieve or update the underlying data. The storage engine is where
processes such as locking, index maintenance and transactions occur.
Since execution plans are created in the relational engine, that's
where we'll be focusing our attention.
Query
Parsing
Step
1 : User input Query
Step
2 : Query passed to the Parser
Parser
Check is the SQL query is written correctly and verify if the query
is well formed.
This
process is known as query parsing.
The output of the Parser process is a parse tree, or query
tree
(or even sequence tree).
Step
3 : Query parser outputs a parse tree.
Step
4 : The parse tree is passed to a process called the
algebrizer.
The
algebrizer resolves all the names of the various objects, tables and
columns, referred to within the query string. The algebrizer
identifies, at the individual column level, all the types
(varchar(50) versus nvarchar(25) and so on) of the objects being
accessed. It also determines the location of aggregates (such as
GROUP BY, and MAX) within the query, a process called aggregate
binding.
This
algebrizer process is important because the query may have aliases or
synonyms, names that don't exist in the database, that need to be
resolved, or the query may refer to objects not in the database.
Step
5 : The algebrizer outputs a binary called the query
processor tree, which is then passed on to the
query optimizer.
Note
: Only DML statements could be Optimized.
The
Query Optimizer
It
uses the query processor tree and the statistics it has about the
data, and applying the model, it works out what it thinks will be the
optimal way to execute the query – that is, it generates an
execution plan.
In
other words, the optimizer figures out how best to implement the
request represented by the T-SQL query you submitted. It decides if
the data can be accessed through indexes, what types of joins to use
and much more.
The
decisions made by the optimizer are based on what it calculates to
be the cost of a given execution plan, in terms of the required CPU
processing and I/O, and how fast it will execute. Hence, this is
known as a cost-based plan.
The
optimizer will generate and evaluate many plans (unless there is
already a cached plan) and, generally speaking, will choose the
lowestcost plan i.e. the plan it thinks will execute the query as
fast
as
possible and use the least amount of resources, CPU and I/O.
For
making the Execution plans the statistics are used, Statistics are
collected on columns and indexes within the database, and describe
the data distribution and the uniqueness, or selectivity, of the
data.
If
statistics exist for a relevant column or index, then the optimizer
will use them in its calculations. Statistics, by default, are
created and updated automatically within the system for all indexes
or for any
column
used as a predicate, as part of a WHERE clause or JOIN ON clause.
The
optimizer takes the statistics, along with the query processor tree,
and
heuristically determines the best plan.
Once
the optimizer arrives at an execution plan, the actual plan is
created and stored in a memory space known as the plan cache.
Query
Execution
Once
the execution plan is generated, the action switches to the storage
engine, where the query is actually executed, according to the plan.
Estimated
and Actual Execution Plans
- The plan created by the Query Optimizer is the Estimated Execution Plan.
- Actual Execution Plan shows what actually happened when the query executed.
When a query is submitted to the
server, an estimated execution plan is created by the optimizer. Once
that plan is created, and before it gets passed to the storage
engine, the optimizer compares this estimated plan to actual
execution plans that already exist in the plan cache. If an actual
plan is found that matches the estimated one, then the optimizer will
reuse the existing plan, since it's already been used before by the
query engine. This reuse avoids the overhead of creating actual
execution plans for large and complex queries or even simple plans
for small queries called thousands of times in a minute.
Execution plans are not sacrosanct.
Certain events and actions can cause a plan to be recompiled. It is
important to remember this because recompiling execution plans can be
a very expensive operation.
To
completely clear the cache, run this:
To see the execution plans
on our system : -
SELECT
[cp].[refcounts],
[cp].[usecounts],
[cp].[objtype],
[st].[dbid],
[st].[objectid],
[st].[text],
[qp].[query_plan]
FROM
sys.dm_exec_cached_plans
cp
CROSS
APPLY
sys.dm_exec_sql_text(cp.plan_handle)
st
CROSS
APPLY
sys.dm_exec_query_plan(cp.plan_handle)
qp
;
When
Parallelism is Requested
When a plan meets the
threshold for parallelism two plans are created. Which plan is
actually executed is up to the query engine.
So you might see a plan with, or
without, parallel operators in the estimated execution plan. When the
query actually executes, you may see a completely different plan if
the query engine determines that it either can't support a parallel
query at that time or that a parallel query is called for.
Execution
Plan Formats
• Graphical Plans
• Text Plans
• XML Plans
No comments