算法时空复杂度分析实用指南
Info
已完成网站教程、网站习题、配套插件中所有多语言代码的校准,解决了之前 chatGPT 翻译可能出错的问题~
In my previous articles, I mainly focused on explaining the principles of algorithms and the thought process for solving problems, often glossing over the analysis of time and space complexity. This was primarily due to the following two reasons:
For readers who are relatively new to the field, I wanted you to concentrate on understanding the principles of algorithms. Adding too much mathematical content can easily discourage people.
Correctly understanding the underlying principles of common algorithms is a prerequisite for analyzing their complexity. Especially for recursive algorithms, you need to think and analyze from the perspective of trees to correctly assess their complexity.
Given that my previous articles have already covered the core principles of all common algorithms, I have decided to write a dedicated guide on analyzing time and space complexity. Teaching you how to fish is better than giving you a fish, so I will provide you with a set of universal methods to analyze the time and space complexity of any algorithm.
This article will be quite lengthy and will cover the following points:
Using time complexity to infer problem-solving approaches, reducing trial and error time.
Where does the time go? Common coding mistakes that can cause algorithms to timeout.
Basic characteristics of Big O notation.
Time complexity analysis in non-recursive algorithms.
Methods for measuring the efficiency of data structure APIs (amortized analysis).
Analysis methods for the time/space complexity of recursive algorithms, which is the focus. I will illustrate this with examples from dynamic programming and backtracking algorithms.
Before diving into the concepts and specific calculation methods of complexity, I will first share some common techniques and pitfalls often encountered in practical scenarios.
Using Complexity to Reverse-Engineer Problem-Solving Strategies
Pay Attention to Data Size
Don't think complexity analysis is there to make things hard for you; it's actually there to help you. It subtly guides you towards the right problem-solving strategy.
You should pay attention to the data size provided in the problem before you start writing code. Complexity analysis can prevent you from wasting time on the wrong approach. Sometimes, it can even directly tell you which algorithm to use.
Why is this the case? Because most problems will tell you the size of the test cases. Based on this, we can infer the allowable time complexity range for the problem, which in turn helps us determine the appropriate algorithm.
For example, if a problem gives you an array with a length up to 10^6
, you can be sure that the time complexity should be less than O(N^2)
. You need to optimize it to O(NlogN)
or O(N)
. If your algorithm is O(N^2)
, the maximum complexity would be 10^12
, which is too slow for most judging systems.
To keep the complexity within O(NlogN)
or O(N)
, your options narrow down. Possible approaches might include sorting the array, using prefix sums, two-pointer technique, or one-dimensional dynamic programming (dp). Approaches like nested for-loops, two-dimensional dp, or backtracking can generally be ruled out.
Here's a more direct example: if you notice that the data size given is very small, like an array length N
not exceeding 20
, you can deduce that the problem likely requires a brute-force enumeration algorithm.
This is because the judging platform will usually try to make things difficult by increasing the data size. If they give you such a small data size, it's probably because the optimal solution has an exponential/factorial complexity. You can confidently use a backtracking algorithm without considering other algorithms.
So, many readers skip looking at the data size and dive straight into coding, which is not the right approach. First, consider all the information provided in the problem, then start coding. This way, you'll avoid unnecessary detours.
Complexity Anomalies Due to Coding Mistakes
This section summarizes common coding mistakes, especially for beginners, that can lead to unexpected time consumption, slowing down your algorithm's performance, or even causing timeouts.
Using Standard Output
When writing algorithmic problems, you might use print/console.log
functions to print states for debugging.
However, before submitting your code, remember to comment out these output statements. Standard output is an I/O operation and can significantly slow down your algorithm's execution.
Incorrectly Passing by Value Instead of by Reference
For example, in C++, function parameters are passed by value by default. If you pass a vector
parameter, the entire data structure is copied, leading to additional time and space overhead. This is especially problematic in recursive functions, where passing by value almost inevitably results in timeouts or memory overflows.
The correct approach is to pass by reference, such as vector<int>&
, which avoids unnecessary copying. Readers using other languages should check for similar issues and understand the parameter passing mechanism of their programming language.
Unclear Underlying Implementation of Interface Objects
This is a more nuanced issue, typically encountered in object-oriented languages like Java, though it occurs less frequently and still warrants attention.
In Java, List
is an interface with many implementations, such as ArrayList
, LinkedList
, etc. If you've studied the basics in Hands-On Implementation of Dynamic Arrays and Hands-On Implementation of Doubly Linked Lists, you know that many methods have different complexities. For instance, ArrayList
's get
method is O(1)
, while LinkedList
's get
method is O(N)
.
So, if a function parameter passes a List
type, would you dare to directly use list.get(i)
for random access? Probably not.
In such cases, it's generally safer to create an ArrayList
, copy all elements from the input List
, and then access them by index.
These are the main issues, all non-algorithmic in nature. Paying attention to them should prevent most problems. Now, let's dive into the formal analysis of algorithm complexity.
Big O Notation
Mathematical Definition of Big O Notation:
O(g(n))
= { f(n)
: There exist positive constants c
and n_0
such that for all n ≥ n_0
, 0 ≤ f(n) ≤ c*g(n)
}
The symbol O
(uppercase letter o
, not the number 0) we commonly use actually represents a set of functions. For example, O(n^2)
represents a set of functions derived from g(n) = n^2
. When we say an algorithm has a time complexity of O(n^2)
, it means the function describing the algorithm's complexity belongs to this set of functions.
Theoretically, understanding this abstract mathematical definition can answer all your questions about Big O notation.
However, considering that most people get dizzy when they see mathematical definitions, I'll list two properties used in complexity analysis. Remembering these two is enough.
1. Only keep the term with the fastest growth rate; other terms can be omitted.
First, constant factors in multiplication and addition can be ignored. For example:
O(2N + 100) = O(N)
O(2^(N+1)) = O(2 * 2^N) = O(2^N)
O(M + 3N + 99) = O(M + N)
Of course, don't eliminate constants blindly. Some constants cannot be removed, as they might involve exponential rules we learned in high school:
O(2^(2N)) = O(4^N)
Besides constant factors, terms with slower growth rates can be ignored in the presence of terms with faster growth rates:
O(N^3 + 999 * N^2 + 999 * N) = O(N^3)
O((N + 1) * 2^N) = O(N * 2^N + 2^N) = O(N * 2^N)
All the examples listed above are the simplest and most common ones, which can be correctly explained by the definition of Big O notation. If you encounter more complex complexity scenarios, you can also judge whether your complexity expression is correct based on the definition.
2. Big O Notation Represents the "Upper Bound" of Complexity.
In other words, as long as you provide an upper bound, using Big O notation to represent it is correct.
For example, consider the following code:
for (int i = 0; i < N; i++) {
print("hello world");
}
If we consider this as an algorithm, its time complexity is clearly O(N)
. However, if you insist that its time complexity is O(N^2)
, theoretically, it is possible, because the O
notation represents an upper bound. The time complexity of this algorithm indeed does not exceed the upper bound of N^2
. Although this upper bound is not very tight, it still fits the definition, so theoretically, it is not incorrect.
The above example is too simple, and insisting on expanding its time complexity upper bound seems meaningless. However, for some algorithms, the complexity depends on the input data, making it impossible to provide a particularly precise time complexity in advance. In such cases, using Big O notation to expand the upper bound of time complexity becomes meaningful.
For example, in the previous article Dynamic Programming Core Framework, the brute-force recursive solution to the coin change problem is discussed, with the core code framework as follows:
// Definition: to make up the amount n, at least dp(coins, n) coins are needed
int dp(int[] coins, int amount) {
// base case
if (amount <= 0) return;
// state transition
for (int coin : coins) {
dp(coins, amount - coin);
}
}
When amount = 11, coins = [1,2,5]
, the recursive tree of the algorithm looks like this:
Later in the article, we will discuss the method to calculate the time complexity of recursive algorithms. For now, let's first determine the number of nodes in this recursive tree.
Assume the value of amount
is N
, and the number of elements in the coins
list is K
. Then, this recursive tree is a K
-ary tree. However, the growth of this tree is directly related to the denominations in the coins
list, making the tree's shape irregular and difficult to precisely calculate the total number of nodes.
For such cases, a simpler approach is to handle it based on the worst-case scenario:
How tall is this tree? Uncertain, so we handle it based on the worst-case scenario, assuming all coins are of denomination 1, making the tree height N
.
What is the structure of this tree? Uncertain, so we handle it based on the worst-case scenario, assuming it is a full K
-ary tree.
So, how many nodes are in this tree? Handling everything based on the worst-case scenario, a full K
-ary tree of height N
has a total number of nodes given by the geometric series sum formula (K^N - 1)/(K - 1)
, which in Big O notation is O(K^N)
.
Of course, we know the actual number of nodes in the tree is not that many, but using O(K^N)
to represent a rough upper bound is acceptable. It is also clear that this is an exponential-time algorithm, indicating a need for optimization to improve efficiency.
Therefore, sometimes if your estimated time complexity differs from someone else's, it doesn't necessarily mean one of you is wrong. Both could be correct, just with different levels of estimation precision.
Ideally, we want a relatively accurate and "tight" upper bound, but obtaining an accurate bound requires a certain level of mathematical skill. For interview preparation, it's acceptable to aim for a less precise but correct order of magnitude (linear/exponential/logarithmic/quadratic, etc.).
In the field of algorithms, besides using Big O to denote the asymptotic upper bound, there are also notations for asymptotic lower bounds and tight bounds. Interested readers can search for more information. However, from a practical perspective, the explanation of Big O notation provided above is sufficient.
Analysis of Non-Recursive Algorithms
The space complexity of non-recursive algorithms is generally easy to calculate. You just need to check if it allocates any storage space like arrays. Therefore, I will mainly discuss the analysis of time complexity.