- 相關(guān)推薦
五種提高SQL性能的方法(一)
譯文
Five Ways to Rev up Your SQL Performance
Sometimes all it takes is a little tweak here or there to make your application run much faster. Ah, but the key is figuring out how to tweak it! Sooner or later you'll face a situation where a SQL query in your application isn't responding the way you intended. Either it doesn't return the data you want or it takes entirely too long to be reasonable. If it slows down a report or your enterprise application, users won't be pleased if they have to wait inordinate amounts of time. And just like your parents didn't want to hear why you were coming in past curfew, users don't want to hear why your query is taking so long. ("Sorry, Mom, I used too many LEFT JOINs.") Users want applications to respond quickly and their reports to return analytical data in a flash. I myself get impatient when I surf the Web and a page takes more than ten seconds to load (OK, more like five seconds).
To resolve these issues, it is important to get to the root of the problem. So where do you start? The root cause is usually in the database design and the queries that access it. In this month's column I'll demonstrate four techniques that can be used to either improve your SQL Server™-based application's performance or improve its scalability. I'll examine the use of LEFT JOINs, CROSS JOINs, and retrieving an IDENTITY value. Keep in mind that there is no magic solution. Tuning your database and its queries takes time, analysis, and a lot of testing. While the techniques here are proven, some may work better than others in your application.
Returning an IDENTITY From an INSERT
I figured I would start with something I get a lot of questions about: how to retrieve an IDENTITY value after performing a SQL INSERT. Often, the problem is not how to write the query to retrieve the value, but rather where and when to do it. In SQL Server, the statement to retrieve the IDENTITY value created by the most recent SQL statement run on the active database connection is as follows:
SELECT @@IDENTITY
While this SQL is far from daunting, it is important to keep in mind that if the most recent SQL statement was not an INSERT or you run this SQL against a different connection than the INSERT SQL, you will not get back the value you expect. You must run this code to retrieve the IDENTITY immediately following the INSERT SQL and on the same connection, like this:
INSERT INTO Products (ProductName) VALUES ('Chalk')
SELECT @@IDENTITY
Running these queries on a single connection against the Northwind database will return to you the IDENTITY value for the new product called Chalk. So in your Visual Basic® application using ADO, you could run the following statement:
Set oRs = oCn.Execute("SET NOCOUNT ON;INSERT INTO Products _
(ProductName) VALUES ('Chalk');SELECT @@IDENTITY")
lProductID = oRs(0)
This code tells SQL Server not to return a row count for the query, then executes the INSERT statement and returns the IDENTITY value just created for the new row. The SET NOCOUNT ON statement means the Recordset that is returned has one row and one column that contains the new IDENTITY value. Without this statement, an empty Recordset is returned (because the INSERT statement returns no data) and then a second Recordset is returned, which contains the IDENTITY value. This can be confusing, especially since you never intended the INSERT to return a Recordset. This situation occurs because SQL Server sees the row count (that is, one row affected) and interprets it as representing a Recordset. So the true data is pushed back into a second Recordset. While you can get to this second Recordset using the NextRecordset method in ADO, it is much easier (and more efficient) if you can always count on the Recordset being the first and only one returned.
While this technique gets the job done, it does require extra code in the SQL statement. Another way of getting the same result is to use the SET NOCOUNT ON statement preceding the INSERT and to put the SELECT @@IDENTITY statement in a FOR INSERT trigger on the table, as shown in the following code snippet. This way, any INSERT statement into that table will automatically return the IDENTITY value.
CREATE TRIGGER trProducts_Insert ON Products FOR INSERT AS
SELECT @@IDENTITY
GO
The trigger only fires when an INSERT occurs on the Products table, so it always will return an IDENTITY after a successful INSERT. Using this technique, you can consistently retrieve IDENTITY values in the same manner across your application.
Inline Views Versus Temp Tables
Queries sometimes need to join data to other data that may only be gathered by performing a GROUP BY and then a standard query. For example, if you want to return the information about the five most recently placed orders, you would first need to know which orders they are. This can be retrieved by using a SQL query that returns the orders' IDs. This data could be stored in a temporary table, a common technique, and then joined to the Product table to return the quantity of products sold on those orders:
CREATE TABLE #Temp1 (OrderID INT NOT NULL, _
OrderDate DATETIME NOT NULL)
INSERT INTO #Temp1 (OrderID, OrderDate)
SELECT TOP 5 o.OrderID, o.OrderDate
FROM Orders o ORDER BY o.OrderDate DESC
SELECT p.ProductName, SUM(od.Quantity) AS ProductQuantity
FROM #Temp1 t
INNER JOIN [Order Details] od ON t.OrderID = od.OrderID
INNER JOIN Products p ON od.ProductID = p.ProductID
GROUP BY p.ProductName
ORDER BY p.ProductName
DROP TABLE #Temp1
This batch of SQL creates a temporary table, inserts the data into it, joins other data to it, and drops the temporary table. This is a lot of I/O for this query, which could be rewritten to use an inline view instead of a temporary table. An inline view is simply a query that can be joined to in the FROM clause. So instead of spending a lot of I/O and disk access in tempdb on a temporary table, you could instead use an inline view to get the same result:
SELECT p.ProductName,
SUM(od.Quantity) AS ProductQuantity
FROM (
SELECT TOP 5 o.OrderID, o.OrderDate
FROM Orders o
ORDER BY o.OrderDate DESC
) t
INNER JOIN [Order Details] od ON t.OrderID = od.OrderID
INNER JOIN Products p ON od.ProductID = p.ProductID
GROUP BY
p.ProductName
ORDER BY
p.ProductName
This query is not only more efficient than the previous one, it's shorter. Temporary tables consume a lot of resources. If you only need the data to join to other queries, you might want to try using an inline view to conserve resources.
Avoid LEFT JOINs and NULLs
There are, of course, times when you need to perform a LEFT JOIN and use NULL values. But they are not a solution for all occasions. Changing the way you structure your SQL queries can mean the difference between a report that takes minutes to run and one that takes only seconds. Sometimes you have to morph the data in a query to look the way your application wants it to look. While the TABLE datatype reduces resource gluttony, there are still plenty of areas in a query that can be optimized. One valuable, commonly used feature of SQL is the LEFT JOIN. It can be used to retrieve all of the rows from a first table and all matching rows from a second table, plus all rows from the second table that do not match the first one. For example, if you wanted to return every Customer and their orders, a LEFT JOIN would show the Customers who did and did not have orders.
This tool can be overused. LEFT JOINs are costly since they involve matching data against NULL (nonexistent) data. In some cases this is unavoidable, but the cost can be high. A LEFT JOIN is more costly than an INNER JOIN, so if you could rewrite a query so it doesn't use a LEFT JOIN, it could pay huge dividends (see the diagram in Figure 1).
Figure 1 Query
One technique to speed up a query that uses a LEFT JOIN involves creating a TABLE datatype and inserting all of the rows from the first table (the one on the left-hand side of the LEFT JOIN), then updating the TABLE datatype with the values from the second table. This technique is a two-step process, but could save a lot of time compared to a standard LEFT JOIN. A good rule is to try out different techniques and time each of them until you get the best performing query for your application.
When you are testing your query's speed, it's important to run it several times and take an average. Your query (or stored procedure) could be stored in the procedure cache in SQL Server's memory and thus would appear to take longer the first time and shorter on all subsequent tries. In addition, other queries could be running against the same tables while your query runs. This could cause your query to stand in line while other queries lock and unlock tables. For example, if you are querying while someone is updating data in that table, your query may take longer to execute while the update commits.
One of the easiest ways to avoid slowdowns with LEFT JOINs is to design the database around them as much as possible. For example, let's assume that a product may or may not have a category. If the product table stores the ID of its category and there was no category for a particular product, you could store a NULL value in the field. Then you would have to perform a LEFT JOIN to get all of the products and their categories. You could create a category with the value of "No Category" and thus specify the foreign key relationship to disallow NULL values. By doing this, you can now use an INNER JOIN to retrieve all products and their categories. While this may seem like a workaround with extra data, this can be a valuable technique as it can eliminate costly LEFT JOINs in SQL batches. Using this concept across the board in a database can save you lots of processing time. Remember, even a few seconds means a lot to your users, and those seconds really add up when you have many users accessing an online database application.
Use Cartesian Products Wisely
For this tip, I will go against the grain and advocate the use of Cartesian products in certain situations. For some reason, Cartesian products (CROSS JOINS) got a bad rap and developers are often cautioned not to use them at all. In many cases, they are too costly to use effectively. But like any tool in SQL, they can be valuable if used properly. For example, if you want to run a query that will return data for every month, even on customers that had no orders that particular month, you could use a Cartesian product quite handily. The SQL in Figure 2 does just that.
While this may not seem like magic, consider that if you did a standard INNER JOIN from Customers to Orders, grouped by the month and summed the sales, you would only get the months where the customer had an order. Thus, you would not get back a 0 value for the months in which the customer didn't order any products. If you wanted to plot a graph per customer showing every month and its sales, you would want the graph to include 0 month sales to identify those months visually. If you use the SQL in Figure 2, the data skips over the months that had $0 in sales because there are no rows in the Orders table for nonsales (it is assumed that you do not store what did not occur).
The code in Figure 3 is longer, but can achieve the same goal of getting all the sales data, even for months without sales. First, it grabs a list of all of the months in the past year and puts them in the first TABLE datatype table (@tblMonths). Next, the code gets a list of all customers' company names who had sales during that time period and puts them in another TABLE datatype table (@tblCus-tomers). These two tables store all of the basic data required to create the resultset except the actual sales numbers.
All of the months are listed in the first table (12 rows) and all of the customers who had sales in that time frame are listed in the second table (81 for me). Not every customer purchased a product in each of the past 12 months, so performing an INNER or LEFT JOIN won't return every customer for every month. These operations will only return the customers and the months when they did purchase something.
A Cartesian product can return all customers for all months. A Cartesian product basically multiplies the first table by the second table and results in a rowset that contains the number of rows in the first table times the number of rows in the second table. Thus, the Cartesian product returns 972 rows into the table @tblFinal. The last steps are to update the table @tblFinal with the monthly sales totals for each customer during the date range and to select the final rowset.
Use CROSS JOINs with caution if you do not need a true Cartesian product because they can be very resource intensive. For example, if you do a CROSS JOIN on products and categories and then use a WHERE clause, DISTINCT or GROUP BY to filter out most of the rows, you could have gotten to the same result in a much more efficient manner by using an INNER JOIN. Cartesian products can be very useful when you need the data returned for all possibilities, as in the case when you want to load a graph with monthly sales dates. But you should not use them for other purposes as INNER JOINs are much more efficient in most scenarios.
Odds and Ends
Here are a few other common techniques that can help improve the efficiency of your SQL querying. Let's assume you are going to group all of your salespeople by region and sum their sales, but you only want salespeople who were marked active in your database. You could group the salespeople by region and use a HAVING clause to eliminate the salespersons who are not active, or you could do this in the WHERE clause. Doing this in the WHERE clause reduces the number of rows that need to be grouped, so it is more efficient than doing it in the HAVING clause. Filtering row-based criteria in the HAVING clause forces the query to group data that could have been eliminated in the WHERE clause.
Another efficiency trick is to use the DISTINCT keyword to find a distinct list of data rows instead of using the GROUP BY clause. In this case, the SQL using the DISTINCT keyword will be more efficient. Reserve use of the GROUP BY for occasions when you need to calculate an aggregate function (SUM, COUNT, MAX, and so on). Also, avoid using the DISTINCT keyword if your query will always return a unique row on its own. In that case, the DISTINCT keyword will only add overhead.
You've seen that numerous techniques can be employed to optimize queries and implement specific business rules; the trick is to try a few and compare their performance. Most important is to test, test, and test again. In future installments of this column, I'll continue to explore SQL Server concepts including database design, good indexing practices, and SQL Server security paradigms.
原文
五種提高SQL性能的方法
有時(shí), 為了讓應(yīng)用程序運(yùn)行得更快,所做的全部工作就是在這里或那里做一些很小調(diào)整。啊,但關(guān)鍵在于確定如何進(jìn)行調(diào)整!遲早您會遇到這種情況:應(yīng)用程序中的 SQL 查詢不能按照您想要的方式進(jìn)行響應(yīng)。它要么不返回?cái)?shù)據(jù),要么耗費(fèi)的時(shí)間長得出奇。如果它降低了報(bào)告或您的企業(yè)應(yīng)用程序的速度,用戶必須等待的時(shí)間過長,他們就會很不滿意。就像您的父母不想聽您解釋為什么在深更半夜才回來一樣,用戶也不會聽你解釋為什么查詢耗費(fèi)這么長時(shí)間。(“對不起,媽媽,我使用了太多的 LEFT JOIN!保┯脩粝M麘(yīng)用程序響應(yīng)迅速,他們的報(bào)告能夠在瞬間之內(nèi)返回分析數(shù)據(jù)。就我自己而言,如果在 Web 上沖浪時(shí)某個(gè)頁面要耗費(fèi)十多秒才能加載(好吧,五秒更實(shí)際一些),我也會很不耐煩。
為了解決這些問題,重要的是找到問題的根源。那么,從哪里開始呢?根本原因通常在于數(shù)據(jù)庫設(shè)計(jì)和訪問它的查詢。在本月的專欄中,我將講述四項(xiàng)技術(shù),這些技術(shù)可用于提高基于 SQL Server? 的應(yīng)用程序的性能或改善其可伸縮性。我將仔細(xì)說明 LEFT JOIN、CROSS JOIN 的使用以及 IDENTITY 值的檢索。請記住,根本沒有神奇的解決方案。調(diào)整您的數(shù)據(jù)庫及其查詢需要占用時(shí)間、進(jìn)行分析,還需要大量的測試。這些技術(shù)都已被證明行之有效,但對您的應(yīng)用程序而言,可能其中一些技術(shù)比另一些技術(shù)更適用。
從 INSERT 返回 IDENTITY
我決定從遇到許多問題的內(nèi)容入手:如何在執(zhí)行 SQL INSERT 后檢索 IDENTITY 值。通常,問題不在于如何編寫檢索值的查詢,而在于在哪里以及何時(shí)進(jìn)行檢索。在 SQL Server 中,下面的語句可用于檢索由最新在活動數(shù)據(jù)庫連接上運(yùn)行的 SQL 語句所創(chuàng)建的 IDENTITY 值:
SELECT @@IDENTITY
這個(gè) SQL 語句并不復(fù)雜,但需要記住的一點(diǎn)是:如果這個(gè)最新的 SQL 語句不是 INSERT,或者您針對非 INSERT SQL 的其他連接運(yùn)行了此 SQL,則不會獲得期望的值。您必須運(yùn)行下列代碼才能檢索緊跟在 INSERT SQL 之后且位于同一連接上的 IDENTITY,如下所示:
INSERT INTO Products (ProductName) VALUES ('Chalk')
SELECT @@IDENTITY
在一個(gè)連接上針對 Northwind 數(shù)據(jù)庫運(yùn)行這些查詢將返回一個(gè)名稱為 Chalk 的新產(chǎn)品的 IDENTITY 值。所以,在使用 ADO 的 Visual Basic? 應(yīng)用程序中,可以運(yùn)行以下語句:
Set oRs = oCn.Execute("SET NOCOUNT ON;INSERT INTO Products _
(ProductName) VALUES ('Chalk');SELECT @@IDENTITY")
lProductID = oRs(0)
此代碼告訴 SQL Server 不要返回查詢的行計(jì)數(shù),然后執(zhí)行 INSERT 語句,并返回剛剛為這個(gè)新行創(chuàng)建的 IDENTITY 值。SET NOCOUNT ON 語句表示返回的記錄集有一行和一列,其中包含了這個(gè)新的 IDENTITY 值。如果沒有此語句,則會首先返回一個(gè)空的記錄集(因?yàn)?INSERT 語句不返回任何數(shù)據(jù)),然后會返回第二個(gè)記錄集,第二個(gè)記錄集中包含 IDENTITY 值。這可能有些令人困惑,尤其是因?yàn)槟鷱膩砭蜎]有希望過 INSERT 會返回記錄集。之所以會發(fā)生此情況,是因?yàn)?SQL Server 看到了這個(gè)行計(jì)數(shù)(即一行受到影響)并將其解釋為表示一個(gè)記錄集。因此,真正的數(shù)據(jù)被推回到了第二個(gè)記錄集。當(dāng)然您可以使用 ADO 中的 NextRecordset 方法獲取此第二個(gè)記錄集,但如果總能夠首先返回該記錄集且只返回該記錄集,則會更方便,也更有效率。
此方法雖然有效,但需要在 SQL 語句中額外添加一些代碼。獲得相同結(jié)果的另一方法是在 INSERT 之前使用 SET NOCOUNT ON 語句,并將 SELECT @@IDENTITY 語句放在表中的 FOR INSERT 觸發(fā)器中,如下面的代碼片段所示。這樣,任何進(jìn)入該表的 INSERT 語句都將自動返回 IDENTITY 值。
CREATE TRIGGER trProducts_Insert ON Products FOR INSERT AS
SELECT @@IDENTITY
GO
觸發(fā)器只在 Products 表上發(fā)生 INSERT 時(shí)啟動,所以它總是會在成功 INSERT 之后返回一個(gè) IDENTITY。使用此技術(shù),您可以始終以相同的方式在應(yīng)用程序中檢索 IDENTITY 值。
內(nèi)嵌視圖與臨時(shí)表
某些時(shí)候,查詢需要將數(shù)據(jù)與其他一些可能只能通過執(zhí)行 GROUP BY 然后執(zhí)行標(biāo)準(zhǔn)查詢才能收集的數(shù)據(jù)進(jìn)行聯(lián)接。例如,如果要查詢最新五個(gè)定單的有關(guān)信息,您首先需要知道是哪些定單。這可以使用返回定單 ID 的 SQL 查詢來檢索。此數(shù)據(jù)就會存儲在臨時(shí)表(這是一個(gè)常用技術(shù))中,然后與 Products 表進(jìn)行聯(lián)接,以返回這些定單售出的產(chǎn)品數(shù)量:
CREATE TABLE #Temp1 (OrderID INT NOT NULL, _
OrderDate DATETIME NOT NULL)
INSERT INTO #Temp1 (OrderID, OrderDate)
SELECT TOP 5 o.OrderID, o.OrderDate
FROM Orders o ORDER BY o.OrderDate DESC
SELECT p.ProductName, SUM(od.Quantity) AS ProductQuantity
FROM #Temp1 t
INNER JOIN [Order Details] od ON t.OrderID = od.OrderID
INNER JOIN Products p ON od.ProductID = p.ProductID
GROUP BY p.ProductName
ORDER BY p.ProductName
DROP TABLE #Temp1
這些 SQL 語句會創(chuàng)建一個(gè)臨時(shí)表,將數(shù)據(jù)插入該表中,將其他數(shù)據(jù)與該表進(jìn)行聯(lián)接,然后除去該臨時(shí)表。這會導(dǎo)致此查詢進(jìn)行大量 I/O 操作,因此,可以重新編寫查詢,使用內(nèi)嵌視圖取代臨時(shí)表。內(nèi)嵌視圖只是一個(gè)可以聯(lián)接到 FROM 子句中的查詢。所以,您不用在 tempdb 中的臨時(shí)表上耗費(fèi)大量 I/O 和磁盤訪問,而可以使用內(nèi)嵌視圖得到同樣的結(jié)果:
SELECT p.ProductName,
SUM(od.Quantity) AS ProductQuantity
FROM (
SELECT TOP 5 o.OrderID, o.OrderDate
FROM Orders o
ORDER BY o.OrderDate DESC
) t
INNER JOIN [Order Details] od ON t.OrderID = od.OrderID
INNER JOIN Products p ON od.ProductID = p.ProductID
GROUP BY
p.ProductName
ORDER BY
p.ProductName
此查詢不僅比前面的查詢效率更高,而且長度更短。臨時(shí)表會消耗大量資源。如果只需要將數(shù)據(jù)聯(lián)接到其他查詢,則可以試試使用內(nèi)嵌視圖,以節(jié)省資源。
避免 LEFT JOIN 和 NULL
當(dāng)然,有很多時(shí)候您需要執(zhí)行 LEFT JOIN 和使用 NULL 值。但是,它們并不適用于所有情況。改變 SQL 查詢的構(gòu)建方式可能會產(chǎn)生將一個(gè)花幾分鐘運(yùn)行的報(bào)告縮短到只花幾秒鐘這樣的天壤之別的效果。有時(shí),必須在查詢中調(diào)整數(shù)據(jù)的形態(tài),使之適應(yīng)應(yīng)用程序所要求的顯示方式。雖然 TABLE 數(shù)據(jù)類型會減少大量占用資源的情況,但在查詢中還有許多區(qū)域可以進(jìn)行優(yōu)化。SQL 的一個(gè)有價(jià)值的常用功能是 LEFT JOIN。它可以用于檢索第一個(gè)表中的所有行、第二個(gè)表中所有匹配的行、以及第二個(gè)表中與第一個(gè)表不匹配的所有行。例如,如果希望返回每個(gè)客戶及其定單,使用 LEFT JOIN 則可以顯示有定單和沒有定單的客戶。
此工具可能會被過度使用。LEFT JOIN 消耗的資源非常之多,因?yàn)樗鼈儼c NULL(不存在)數(shù)據(jù)匹配的數(shù)據(jù)。在某些情況下,這是不可避免的,但是代價(jià)可能非常高。LEFT JOIN 比 INNER JOIN 消耗資源更多,所以如果您可以重新編寫查詢以使得該查詢不使用任何 LEFT JOIN,則會得到非?捎^的回報(bào)(請參閱圖 1 中的圖)。
圖 1:查詢
加快使用 LEFT JOIN 的查詢速度的一項(xiàng)技術(shù)涉及創(chuàng)建一個(gè) TABLE 數(shù)據(jù)類型,插入第一個(gè)表(LEFT JOIN 左側(cè)的表)中的所有行,然后使用第二個(gè)表中的值更新 TABLE 數(shù)據(jù)類型。此技術(shù)是一個(gè)兩步的過程,但與標(biāo)準(zhǔn)的 LEFT JOIN 相比,可以節(jié)省大量時(shí)間。一個(gè)很好的規(guī)則是嘗試各種不同的技術(shù)并記錄每種技術(shù)所需的時(shí)間,直到獲得用于您的應(yīng)用程序的執(zhí)行性能最佳的查詢。
測試查詢的速度時(shí),有必要多次運(yùn)行此查詢,然后取一個(gè)平均值。因?yàn)椴樵儯ɑ虼鎯^程)可能會存儲在 SQL Server 內(nèi)存中的過程緩存中,因此第一次嘗試耗費(fèi)的時(shí)間好像稍長一些,而所有后續(xù)嘗試耗費(fèi)的時(shí)間都較短。另外,運(yùn)行您的查詢時(shí),可能正在針對相同的表運(yùn)行其他查詢。當(dāng)其他查詢鎖定和解鎖這些表時(shí),可能會導(dǎo)致您的查詢要排隊(duì)等待。例如,如果您進(jìn)行查詢時(shí)某人正在更新此表中的數(shù)據(jù),則在更新提交時(shí)您的查詢可能需要耗費(fèi)更長時(shí)間來執(zhí)行。
避免使用 LEFT JOIN 時(shí)速度降低的最簡單方法是盡可能多地圍繞它們設(shè)計(jì)數(shù)據(jù)庫。例如,假設(shè)某一產(chǎn)品可能具有類別也可能沒有類別。如果 Products 表存儲了其類別的 ID,而沒有用于某個(gè)特定產(chǎn)品的類別,則您可以在字段中存儲 NULL 值。然后您必須執(zhí)行 LEFT JOIN 來獲取所有產(chǎn)品及其類別。您可以創(chuàng)建一個(gè)值為“No Category”的類別,從而指定外鍵關(guān)系不允許 NULL 值。通過執(zhí)行上述操作,現(xiàn)在您就可以使用 INNER JOIN 檢索所有產(chǎn)品及其類別了。雖然這看起來好像是一個(gè)帶有多余數(shù)據(jù)的變通方法,但可能是一個(gè)很有價(jià)值的技術(shù),因?yàn)樗梢韵?SQL 批處理語句中消耗資源較多的 LEFT JOIN。在數(shù)據(jù)庫中全部使用此概念可以為您節(jié)省大量的處理時(shí)間。請記住,對于您的用戶而言,即使幾秒鐘的時(shí)間也非常重要,因?yàn)楫?dāng)您有許多用戶正在訪問同一個(gè)聯(lián)機(jī)數(shù)據(jù)庫應(yīng)用程序時(shí),這幾秒鐘實(shí)際上的意義會非常重大。
靈活使用笛卡爾乘積
對于此技巧,我將進(jìn)行非常詳細(xì)的介紹,并提倡在某些情況下使用笛卡爾乘積。出于某些原因,笛卡爾乘積 (CROSS JOIN) 遭到了很多譴責(zé),開發(fā)人員通常會被警告根本就不要使用它們。在許多情況下,它們消耗的資源太多,從而無法高效使用。但是像 SQL 中的任何工具一樣,如果正確使用,它們也會很有價(jià)值。例如,如果您想運(yùn)行一個(gè)返回每月數(shù)據(jù)(即使某一特定月份客戶沒有定單也要返回)的查詢,您就可以很方便地使用笛卡爾乘積。 圖 2 中的 SQL 就執(zhí)行了上述操作。
雖然這看起來好像沒什么神奇的,但是請考慮一下,如果您從客戶到定單(這些定單按月份進(jìn)行分組并對銷售額進(jìn)行小計(jì))進(jìn)行了標(biāo)準(zhǔn)的 INNER JOIN,則只會獲得客戶有定單的月份。因此,對于客戶未訂購任何產(chǎn)品的月份,您不會獲得 0 值。如果您想為每個(gè)客戶都繪制一個(gè)圖,以顯示每個(gè)月和該月銷售額,則可能希望此圖包括月銷售額為 0 的月份,以便直觀標(biāo)識出這些月份。如果使用 圖 2 中的 SQL,數(shù)據(jù)則會跳過銷售額為 0 美元的月份,因?yàn)樵诙▎伪碇袑τ诹沅N售額不會包含任何行(假設(shè)您只存儲發(fā)生的事件)。
圖 3 中的代碼雖然較長,但是可以達(dá)到獲取所有銷售數(shù)據(jù)(甚至包括沒有銷售額的月份)的目標(biāo)。首先,它會提取去年所有月份的列表,然后將它們放入第一個(gè) TABLE 數(shù)據(jù)類型表 (@tblMonths) 中。下一步,此代碼會獲取在該時(shí)間段內(nèi)有銷售額的所有客戶公司的名稱列表,然后將它們放入另一個(gè) TABLE 數(shù)據(jù)類型表 (@tblCus-tomers) 中。這兩個(gè)表存儲了創(chuàng)建結(jié)果集所必需的所有基本數(shù)據(jù),但實(shí)際銷售數(shù)量除外。 第一個(gè)表中列出了所有月份(12 行),第二個(gè)表中列出了這個(gè)時(shí)間段內(nèi)有銷售額的所有客戶(對于我是 81 個(gè))。并非每個(gè)客戶在過去 12 個(gè)月中的每個(gè)月都購買了產(chǎn)品,所以,執(zhí)行 INNER JOIN 或 LEFT JOIN 不會返回每個(gè)月的每個(gè)客戶。這些操作只會返回購買產(chǎn)品的客戶和月份。
笛卡爾乘積則可以返回所有月份的所有客戶。笛卡爾乘積基本上是將第一個(gè)表與第二個(gè)表相乘,生成一個(gè)行集合,其中包含第一個(gè)表中的行數(shù)與第二個(gè)表中的行數(shù)相乘的結(jié)果。因此,笛卡爾乘積會向表 @tblFinal 返回 972 行。最后的步驟是使用此日期范圍內(nèi)每個(gè)客戶的月銷售額總計(jì)更新 @tblFinal 表,以及選擇最終的行集。
如果由于笛卡爾乘積占用的資源可能會很多,而不需要真正的笛卡爾乘積,則可以謹(jǐn)慎地使用 CROSS JOIN。例如,如果對產(chǎn)品和類別執(zhí)行了 CROSS JOIN,然后使用 WHERE 子句、DISTINCT 或 GROUP BY 來篩選出大多數(shù)行,那么使用 INNER JOIN 會獲得同樣的結(jié)果,而且效率高得多。如果需要為所有的可能性都返回?cái)?shù)據(jù)(例如在您希望使用每月銷售日期填充一個(gè)圖表時(shí)),則笛卡爾乘積可能會非常有幫助。但是,您不應(yīng)該將它們用于其他用途,因?yàn)樵诖蠖鄶?shù)方案中 INNER JOIN 的效率要高得多。
拾遺補(bǔ)零
這里介紹其他一些可幫助提高 SQL 查詢效率的常用技術(shù)。假設(shè)您將按區(qū)域?qū)λ袖N售人員進(jìn)行分組并將他們的銷售額進(jìn)行小計(jì),但是您只想要那些數(shù)據(jù)庫中標(biāo)記為處于活動狀態(tài)的銷售人員。您可以按區(qū)域?qū)︿N售人員分組,并使用 HAVING 子句消除那些未處于活動狀態(tài)的銷售人員,也可以在 WHERE 子句中執(zhí)行此操作。在 WHERE 子句中執(zhí)行此操作會減少需要分組的行數(shù),所以比在 HAVING 子句中執(zhí)行此操作效率更高。HAVING 子句中基于行的條件的篩選會強(qiáng)制查詢對那些在 WHERE 子句中會被去除的數(shù)據(jù)進(jìn)行分組。
另一個(gè)提高效率的技巧是使用 DISTINCT 關(guān)鍵字查找數(shù)據(jù)行的單獨(dú)報(bào)表,來代替使用 GROUP BY 子句。在這種情況下,使用 DISTINCT 關(guān)鍵字的 SQL 效率更高。請?jiān)谛枰?jì)算聚合函數(shù)(SUM、COUNT、MAX 等)的情況下再使用 GROUP BY。另外,如果您的查詢總是自己返回一個(gè)唯一的行,則不要使用 DISTINCT 關(guān)鍵字。在這種情況下,DISTINCT 關(guān)鍵字只會增加系統(tǒng)開銷。
您已經(jīng)看到了,有大量技術(shù)都可用于優(yōu)化查詢和實(shí)現(xiàn)特定的業(yè)務(wù)規(guī)則,技巧就是進(jìn)行一些嘗試,然后比較它們的性能。最重要的是要測試、測試、再測試。在此專欄的將來各期內(nèi)容中,我將繼續(xù)深入講述 SQL Server 概念,包括數(shù)據(jù)庫設(shè)計(jì)、好的索引實(shí)踐以及 SQL Server 安全范例。
【五種提高SQL性能的方法(一)】相關(guān)文章:
談英語聽力提高的方法03-05
提高AutoCAD繪圖速度的若干方法03-18
提高礦井主提升能力方法的探討03-20
提高歷史課堂的教學(xué)效率的方法03-02
談提高鋼琴練習(xí)效率的有效方法03-29
怎樣提高資金控制力的方法03-24
關(guān)于提高企業(yè)員工韌性的途徑和方法03-18
提高淺靜脈穿刺成功率的方法03-04