Hebtech

Index Fragmentation in Azure SQL Database

Index Fragmentation in Azure SQL Database
FEBRUARY 4, 2015

Recently, I read the articles Stop Worrying About SQL Server Fragmentation and When Does Index Fragmentation Matter?, and that got me starting to think about how index fragmentation in Azure SQL Database compares to index fragmentation in on-premise versions of SQL Server. I also started to wonder, "Does Fragmentation matter in SQL Azure"? I decided to compare index fragmentation in Azure SQL Database vs an on-premise SQL Server using the same techniques used in the articles just mentioned.

Method of Comparison
In order to compare fragmentation, I used the script written by Jeremiah Peschka in the article When Does Index Fragmentation Matter? with a few modifications. This script basically does the following:

  1. Creates a table with a Primary Key and a non-clustered index
  2. Inserts 100000 records into the table
  3. Deletes 70% of the records in order to fragment the non-clustered index
  4. Rebuilds the indexes

If you want to get more detail as to how the script works, read Jerimiah's excellent articleAlong the way, the size of the indexes is shown in pages and KB to show how each of these operations affect the size of the indexes. In order for the script to run in SQL Azure, I modified the method used to create random numbers (because master.dbo.spt_values does not exist in Azure SQL Database). I also increased the number of records that were inserted in order to have a better picture of the changes in size involved. The entire script (with my amendments) is shown below. There is a link to download the script at the end of the article.

For SQL Server, I used SQL Server 2012 Standard. For Azure SQL Database, I experimented on a variety of different Service Tiers. All of the results of the Fragmentation script came out the same, although the insert of 100000 records did happen faster at higher Service Tiers due to the increase in Database Throughput Units (DTU's).

Results of Running the Tests - SQL Server Standard

The explanation of the results is as follows:
  1. Baseline Results - This result shows the size of the indexes in pages and KB on the truncated table. As expected, both are 0.
  2. 100000 Records Inserted - This result shows that there were 100000 records inserted into the table.
  3. 100000 record index size - This result shows the size of the index after inserting the 100000 records.
  4. 70% Deleted - This result shows the size of the index after deleting 70% of the records and fragmenting the non-clustered index.
  5. After Defragmentation - This result shows the size of the index after defragmentation.
As you can see with SQL Server Standard, there is nothing really radical that can be seen here. Indexes are getting smaller once they are defragmented as expected. The results with Azure SQL Database are a bit more interesting.
 
Results of Running the Tests - Azure SQL Database
 
The results for Azure SQL Database are as follows:
  1. Baseline Results - This is the same. Empty databases have empty indexes.
  2. 100000 Records Inserted - Same. 100000 rows inserted.
  3. 100000 record Index Size - The thing to notice here is that the size of both the non-clustered and clustered indexes are much larger in Azure than the size of indexes in SQL Server Standard. The size of the non-clustered index is 129% larger while the clustered index is 37% larger than in SQL Server Standard.
  4. 70% Deleted - This result is interesting as well in that it shows that the non-clustered index dramatically grows in size as it gets fragmented. This result is completely at odds with the result from SQL Server Standard where the indexes remain the same size. It also doesn't make a lot of sense. If you think about the analogy of indexes as phone books (for those of you old enough to remember phone books), deleting the records should be removing the pages from the phone book. The size of the phone book shouldn't grow because we have removed pages. The other interesting thing here is that the size of the fragmented index is now 347% larger than in SQL Server Standard.
  5. After Defragmentation - Once the Azure non-clustered index has been defragmented, it is only 14% larger than the non-clustered index in SQL Server Standard.

Does Fragmentation matter in SQL Azure?
Pretty interesting results, but is it time to panic? I don't think so, but I am going to sit on the fence on this one. Realistically, your can pretty much assume that you will not have sequential access to the disk for your database in Azure SQL Database so having a fully defragmented index is not going to improve the performance of your application greatly. That said, it is definitely concerning that the index grew by 347% as it became fragmented, while in SQL Server Standard, the index remained the same size. Personally, I'm going to keep an eye on things on my production databases and defragment where I think its necessary. I would be interested if anyone has any theories as to why this is happening or how others are handling the problem.

References:

 

Comments

Submitted by Jeremiah Peschka (not verified) on
Great post, and thanks for doing the work! I think you are seeing some of the size difference due to RCSI being enabled in Azure. This will give you a 14-byte overhead per row and could account for at leas some of the size difference in Azure.

Submitted by paul on
Thanks for reading, Jerimiah! I do agree that the RCSI could account for the overall size of the indexes being greater - I hadn't thought of that. But would it account for the non-clustered index growing by so much when the records were deleted?

Submitted by Mo (not verified) on
Would this have anything to do with the fact that SQL Azure indexes can only have a FILLFACTOR of 100?

Submitted by paul on
I did play around with different fill factors in the on-premise version of SQL Server, but it didn't lead to the results that I saw on SQL Azure.

Submitted by Jörg Weigel (not verified) on
This is weird. Why should an index get larger at all when deleting records.

Submitted by Bob Amy (not verified) on
I hope someone can answer why azure and the local DBMS are not the same number of pages. Aren't they using the same DBMS?

Add new comment