Community Server for Rent a Coder

How Software Gets Done
Welcome to Community Server for Rent a Coder Sign in | Join | Help
in Search

Real Life Microsoft IT

How SQL Server 2005 "Forced Parameterization" cut ad-hoc query CPU usage by 85%

Is it my imagination or is SQL 2005 running slower?

My first impresson with SQL Server 2005 came after installing it on a test server.  The test server is a dual processor machine...which was a pretty high powered box back in it's hey-day, but these days it's a commodity server.  It runs both SQL Server and IIS,so it is more sensitive to load problems than the production server.  This make it a good "canary in a coal-mine", because it detects and helps flush out performance problems early in the process.

The first thing I noticed with 2005, was that every time I rebooted the machine, the first time I ran a particular web page (which ran queries to SQL ), SQL Server 2005 would take a really long time to return the results. In 2000, it would run in about 7 seconds, but in 2005 it would timeout the web page (meaning it took at least 10 seconds). But after that first call it would be fine..and would actually run faster than 7 seconds.   This weird problem would only occur after every reboot...and would then disappear.

Since I had hundreds of other tasks that needed to be done for the upgrade, and it only happened on bootup, I simply increased the timeout on the page to 15 seconds, and pressed on to other matters.  But it stayed in the back of my mind.

SQL Server 2005: Faster to Execute but Slower to Compile?

All versions of SQL Server first compile a query and then execute it.  I have seen many times where SQL Server 2005 executes queries faster than 2000...but I have also seen many times where it took longer to compile them in the first place.  Most likely the two are related...it takes more time to create a better query plan.

This isn't necessarily a bad thing.  In fact, if 100% of your workload is stored procedures, it is a great improvement.  That is because stored procs are precompiled, so they are only compiled once and executed many times. So if this is your situation, then you will save many multiples of time more than you are giving up.

BUT...if on the other hand if your workload has a large percentage of ad hoc queries (for example: SQL called from a programming language like ASP.NET), then the situation will be reversed.  That is becuase ad hoc queries have to be compiled each and every time they need to be run.  And if this is your case, then they will run slower in 2005 than in 2000.

Workarounds

In an ideal world this would not be a problem, becuase everything would be written in stored procedures.  However, this ideal case doesn't always exist.  Some apps use 3rd party middle tier components which preclude the abilty to implement stored procs.  Others require the ability to run on multiple vendor databases...meaning they can only use vendor-neutral SQL. 

The more typical application is homegrown and designed only for SQL Server, so it doesn't have any of the above problems.  These apps typically use stored procs when they can, but but use an adhoc query when they can't.  For example, a search screen with many paramaters creates very complex SQL that morphs dramatically from query to query.  If you choose one set of options, you might get a simple query, and if you choose others, you may have to INNER or OUTER JOIN to 10-11 additional tables, new fields, WHERE, HAVING, GROUP BY clauses, etc.  This easy to code in a declarative language like .NET.  But this sort of structure is an exponentially growing nightmare in a true TSQL stored procedure.  .NET stored procs are a leg up on TSQL and can make non-set based operations easier to do.  But they are little slower than TSQL.  More imporantly, all stored procedures (including .NET, TSQL and even dynamic TSQL), cause a fundamental programming problem. Coding them requires maintaining an interface in two places...in the stored proc and in the code.  Every change requires two changes rather than one...and every time new functionality is added, this usually requires an interface change.  This translates into double the development time, double the chances for bugs and double the work.

Is there another way?

But what if there were a way to get:
1) The performance of a TSQL stored proc 
2) The power and flexibility of a declarative language
3) That only requires one interface?

Before you say "Keep dreaming!"...I'd recommend that.you read the book "Good to Great".  In that book the author talks about "good" companies being ruled by the "tyranny of the OR" and "great" companies believing in the "genius of the and".  For example, a "good" company would believe they could either offer the cheapest price, or good quality but not both.  A "great" company would find a way to do both.

Is it possible to harness the "genuis of the AND" in our database situation?  The developers of SQL Server 2005 dared to ask that question.  And the result of that is a wonderous creation.  The name doesn't do it enough justice and has probably caused it to be one of the least heralded and unheard of features in SQL Server 2005.  (Do a search on Google and you'll find perhaps 6 mentions on the entire web...none in Microsoft's marketing.)  This little known feature is called "forced parameterization". 

That sounds familiar


You may have heard of the term "parmaterization" before. Simple parameterization was SQL 2000's first attempt at finding the "genius of the AND".   Genius solutions don't come easily... and this first attempt was a nice step in the right direction but ultimately fell short. 

Basically, SQL would analyze your non stored proc queries and figure out for itself if it could create a query plan on the fly with it.  It did this by subbing out some of the values you entered with paramaters that could be reused by similar queries.  The only problem was that it was very simple and didn't handle many real life situations (such as an IN clause for excample).  Worse, changing the query just a little bit could often "fool" it into not recognizing it as being the same as the original, and it wouldn't "kick in".  So while simple parmaterization helped a little, it wasn't enough of a solution to solve the problem.

Microsoft put the intervening 5 years between 2005 and 2005 to good use and basically revamped this code entirely and pumped it up on steriods.  Possibly the incredible increase in CPU processing power in those 5 years led them to believe that their customers would have the horse power to enable such a solution to run effectively.  They made it more robust, more dynamic and more powerful, and yet kept it lightning quick.  They enabled it to do handle a much larger variety of queries.  And most importantly...if it wasn't able to figure out a certain part of the query, it didn't just "give up" (as simple paramaterization did/does)...it just skipped that small portion and paramaterized the rest. 

They called this feature "forced paramaterization". 

What does this mean in real life? 

So what does this mean to you?   I ran this test during an offpeak time period, but during which a large # of ad hoc queries are typically run.  With simple paramaterization (which is on by default), the result was a 25% average CPU utilization:



Not bad.  So then I turned on forced parameterization.  It's done at the database level under "properties" and "options".  This was the result:



At first I thought something had to be wrong. The processor usage dropped to 1/5th of what it had been before...just 5%!   Had everyone stopped using the database all of a sudden?  I paniced for a minute and then settled down once I confirmed that there was no site-wide emergency and things were actually running normally.  I watched it very closely for 10 minutes, and "force parameterization" took on query after query like a champ and didn't even break a sweat.  The # of cached query plans rose as the CPU kept low.  Clearly the compiling of the ad hoc queries was a bottleneck before-hand and "forced parameterization" wiped out that bottleneck like a truck running over an ice cream cone!

And the best thing about it is that it tends to get better with time.  As more queries run, more get cached, and your server runs faster and faster.  By removing that compiling bottleneck your ad-hoc queries fly!

Is there a Downside?

Genius doesn't come easy.  There are some cases where even forced paramaterization doesn't work.  It may have a tendency to make your queries not use indexed views (if you are using them).  And now that your ad hoc queries have the pre-compilation powers of stored procs, they have the downsides of that power as well.  Particularly sometimes a compiled plan for a typical situation doesn't work right for an outlier situation (or vice versa) and can effect performance.  But this is a rare situation and can be handled on a case by case basis if and when it happens.  You can read the help files on the feature for more information on the pros and cons of forced parameterization and decide for yourself when it might or might not be appropriate for you.

But for us, it has been a godsend and we will continue to use it in production.  And I can't wait to see how this feature expands in the next release.

Kudos to the SQL Server 2005 team on this little talked about but great feature.

Links:
Here is a good blog entry from a Microsoft SQL Server developer named Cihangir Biyikoglu on the feature:
http://blogs.msdn.com/cbiyikoglu/archive/2005/11/03/488920.aspx

Ian Ippolito

Published Sunday, April 30, 2006 8:11 PM by admin
Filed Under:

Comments

No Comments
Anonymous comments are disabled

This Blog

Post Calendar

<April 2006>
SuMoTuWeThFrSa
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456

Post Categories

Syndication

Powered by Community Server, by Telligent Systems