Improving SQL Query Performance


Improving SQL Query Performance


During the past month and a half, I have identified queries using B-TREE indexes with low cardinality.  This paper is intended to show how query performance will increase dramatically by forcing the queries to use an Index with high cardinality.

Let me first explain what I mean by cardinality.  By taking one of the slow running queries I found will help highlight the meaning of cardinality, and will show the advantage of using an Index with high cardinality versus using an Index with low cardinality.

SELECT A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated` FROM `GL_Detail` A1 use INDEX(TransactionNo) WHERE A1.`Year_Period`='201104' AND A1.`ProductionLoc` LIKE '81%' ORDER BY 1\G
572 rows in set (29.60 sec)

The above query will try to use an Index that is contained in the WHERE CLAUSE, eg. Year_Period AND ProductionLoc.   To see if these columns are a good match for an Index with high cardinality I will run a query to see how many distinct rows are contained in each column.

mysql> select distinct Year_Period from GL_Detail;
5 rows in set (0.00 sec)

select distinct ProductionLoc from GL_Detail
54 rows in set (2.76 sec)

Next, determine how many rows are in the table:

mysql> select count(*) from GL_Detail;
+----------+
| count(*) |
+----------+
|  1252368 |
+----------+
1 row in set (1.99 sec)
mysql>

So there are 5 distinct rows in the GL_Detail that match Year_Period, and 54 rows that match ProductionLoc, and we know there are 1252368 rows in the GL_Detail table.   High cardinality is when the distinct value of the columns closely matches the number of rows in the table.   

So, now we know the Year_Period and the ProductionLoc columns have low cardinality since they both contain few distinct rows, and are not the best use for a B-TREE Index.  Let me explain further, a query that is Indexing off the Year_Period column of the GL_Detail table must scan all values that equal the distinct value, see the below query:

mysql> select count(*) from GL_Detail where Year_Period = '201104';
+----------+
| count(*) |
+----------+
|   563728 |
+----------+
1 row in set (0.40 sec)

Our example query at the beginning was Indexing off the Year_Period column, the above query shows there are 563,728 rows that match the Year_Period equal to 201104, in other words our query has to scan ½ million rows to get the right match, it found 572 rows and took 29.60 seconds.

Forcing this query to use an Index with high cardinality will speed up the query over 5 times faster by using the TransactionNo Index instead.

mysql> select distinct TransactionNo from GL_Detail;
198034 rows in set (1.60 sec)

SELECT A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated` FROM `GL_Detail` A1 use INDEX(TransactionNo) WHERE A1.`Year_Period`='201104' AND A1.`ProductionLoc` LIKE '81%' ORDER BY 1\G
572 rows in set (4.86 sec)

I have been seeing several queries using Indexes with low cardinality, like the queries below, and should be looked at to see if similar queries can be altered to use an Index with high cardinality.  

You can also Force the query to use the PRIMARY KEY by using the syntax USE KEY(PRIMARY).  On a side note, Indexes with low cardinality should be BITMAP Indexes, however MySQL does NOT support BITMAP Indexes at this time.  Also, scanning ½ million rows is better than scanning 1.2 million rows (eg. a full table scan of the current GL_Detail table). 

SELECT A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated` FROM `GL_Detail` A1 WHERE A1.`Year_Period`='201104' AND A1.`GLAccount` LIKE '%13000%' ORDER BY 1

SELECT A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated` FROM `GL_Detail` A1 WHERE A1.`Year_Period`='201104' AND A1.`TransactionNo` LIKE '%039066%' ORDER BY 1\G

SELECT * FROM GL_Detail_CurPeriod WHERE Year_Period='201104' AND GLAccount='200001150000' AND GLSource='P/T' AND DebitCredit='D'
SELECT `Year_Period`,`SrcProgram` FROM `GL_Detail` WHERE `Year_Period`='201103' AND `SrcProgram`='1526'

All of the above columns in the WHERE CLAUSE have low cardinality and would benefit from using an Index with high cardinality.  Furthermore, referencing the last query I’m not sure why you want to select the Year_Period and ScrProgram and not do a count(*) instead as the query will only return `Year_Period`='201103' AND `SrcProgram`='1526'?

mysql> SELECT count(*) FROM `GL_Detail` WHERE `Year_Period`='201103' AND `SrcProgram`='1526'\G
*************************** 1. row ***************************
count(*): 10746
1 row in set (10.10 sec)

mysql> SELECT count(*) FROM `GL_Detail` use key(PRIMARY) WHERE `Year_Period`='201103' AND `SrcProgram`='1526'\G
*************************** 1. row ***************************
count(*): 10746
1 row in set (2.25 sec)



Comments

Popular posts from this blog

PostgreSQL Database Version 13.4 To MySQL Database Version 8.0.20 Migration by using SQLines Tool

MariaDB Database Multi-instance implementation in single machine