Improving SQL Query Performance
Improving SQL Query Performance
During the past month and a half, I have identified queries using B-TREE
indexes with low cardinality. This paper
is intended to show how query performance will increase dramatically by forcing
the queries to use an Index with high cardinality.
Let me
first explain what I mean by cardinality.
By taking one of the slow running queries I found will help highlight
the meaning of cardinality, and will show the advantage of using an Index with
high cardinality versus using an Index with low cardinality.
SELECT
A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated`
FROM `GL_Detail` A1 use INDEX(TransactionNo) WHERE A1.`Year_Period`='201104' AND
A1.`ProductionLoc` LIKE '81%' ORDER BY 1\G
572 rows
in set (29.60 sec)
The
above query will try to use an Index that is contained in the WHERE CLAUSE, eg.
Year_Period AND ProductionLoc. To see
if these columns are a good match for an Index with high cardinality I will run
a query to see how many distinct rows are contained in each column.
mysql>
select distinct Year_Period from GL_Detail;
5 rows
in set (0.00 sec)
select
distinct ProductionLoc from GL_Detail
54 rows
in set (2.76 sec)
Next,
determine how many rows are in the table:
mysql>
select count(*) from GL_Detail;
+----------+
|
count(*) |
+----------+
| 1252368 |
+----------+
1 row in
set (1.99 sec)
mysql>
So there
are 5 distinct rows in the GL_Detail that match Year_Period, and 54 rows that
match ProductionLoc, and we know there are 1252368 rows in the GL_Detail
table. High cardinality is when the
distinct value of the columns closely matches the number of rows in the
table.
So, now we know the Year_Period
and the ProductionLoc columns have low cardinality since they both contain few
distinct rows, and are not the best use for a B-TREE Index. Let me explain further, a query that is
Indexing off the Year_Period column of the GL_Detail table must scan all values
that equal the distinct value, see the below query:
mysql>
select count(*) from GL_Detail where Year_Period = '201104';
+----------+
|
count(*) |
+----------+
| 563728 |
+----------+
1 row in
set (0.40 sec)
Our
example query at the beginning was Indexing off the Year_Period column, the
above query shows there are 563,728 rows that match the Year_Period equal to
201104, in other words our query has to scan ½ million rows to get the right
match, it found 572 rows and took 29.60 seconds.
Forcing
this query to use an Index with high cardinality will speed up the query over 5
times faster by using the TransactionNo Index instead.
mysql>
select distinct TransactionNo from GL_Detail;
198034
rows in set (1.60 sec)
SELECT A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated` FROM `GL_Detail` A1 use INDEX(TransactionNo) WHERE A1.`Year_Period`='201104' AND A1.`ProductionLoc` LIKE '81%' ORDER BY 1\G
572 rows
in set (4.86 sec)
I have been seeing several queries using Indexes with low cardinality, like the queries below, and should be looked at to see if similar queries can be altered to use an Index with high cardinality.
You can also Force the query to use the PRIMARY KEY by using the syntax USE KEY(PRIMARY). On a side note, Indexes with low cardinality
should be BITMAP Indexes, however MySQL does NOT support BITMAP Indexes at this
time. Also, scanning ½ million rows is
better than scanning 1.2 million rows (eg. a full table scan of the current
GL_Detail table).
SELECT
A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated`
FROM `GL_Detail` A1 WHERE A1.`Year_Period`='201104' AND A1.`GLAccount` LIKE
'%13000%' ORDER BY 1
SELECT
A1.`GLDetailID`,A1.`Year_Period`,A1.`GLAccount`,A1.`GLSource`,A1.`SubLedger`,A1.`GLSeq`,A1.`GLRef`,A1.`PriorPerReversal`,A1.`TransactionNo`,A1.`TransType`,A1.`SubType`,A1.`FrtSeq`,A1.`ProductionLoc`,A1.`ProductionSeq`,A1.`AdjSeq`,A1.`InterCoCode`,A1.`TaxCode`,A1.`EntityNo`,A1.`TraderCode`,A1.`OrderNo`,A1.`OrderLine`,A1.`BuySellBranch`,A1.`ShipFrom`,A1.`ShipTo`,A1.`ShipmentDate`,A1.`BrokerageInOutCd`,A1.`MaterialCode`,A1.`RegradeMaterialCode`,A1.`BranchPostTo`,A1.`InventoryCode`,A1.`YardCode`,A1.`InvProdClass`,A1.`VehicleType`,A1.`VehicleNo`,A1.`Weight`,A1.`BaleCarCount`,A1.`DebitCredit`,A1.`DebitAmount`,A1.`CreditAmount`,A1.`GLPairID`,A1.`FundsType`,A1.`VoucherNo`,A1.`InvoiceNo`,A1.`SupplierChgBk`,A1.`PaidBank`,A1.`CheckNo`,A1.`CRRefNo`,A1.`IsBalanced`,A1.`SrcProgram`,A1.`LastUpdated`
FROM `GL_Detail` A1 WHERE A1.`Year_Period`='201104' AND A1.`TransactionNo` LIKE
'%039066%' ORDER BY 1\G
SELECT *
FROM GL_Detail_CurPeriod WHERE Year_Period='201104' AND
GLAccount='200001150000' AND GLSource='P/T' AND DebitCredit='D'
SELECT
`Year_Period`,`SrcProgram` FROM `GL_Detail` WHERE `Year_Period`='201103' AND
`SrcProgram`='1526'
All of the above columns in the WHERE CLAUSE have low cardinality and would benefit from using an Index with high cardinality. Furthermore, referencing the last query I’m not sure why you want to select the Year_Period and ScrProgram and not do a count(*) instead as the query will only return `Year_Period`='201103' AND `SrcProgram`='1526'?
mysql>
SELECT count(*) FROM `GL_Detail` WHERE `Year_Period`='201103' AND
`SrcProgram`='1526'\G
***************************
1. row ***************************
count(*):
10746
1 row in
set (10.10 sec)
mysql>
SELECT count(*) FROM `GL_Detail` use key(PRIMARY) WHERE `Year_Period`='201103' AND
`SrcProgram`='1526'\G
***************************
1. row ***************************
count(*):
10746
1 row in
set (2.25 sec)
Comments
Post a Comment