数组(顺序存储)基本原理
When we talk about "arrays," there are different contexts because the array types and APIs provided by various programming languages are different. Therefore, let's first unify the terminology to facilitate the subsequent explanation.
I suggest we can temporarily classify "arrays" into two main categories: "static arrays" and "dynamic arrays."
A "static array" is a block of continuous memory space, and we can access the elements in this memory space through indexing. This is the original form of an array.
On the other hand, "dynamic arrays" are built on top of static arrays by programming languages to make them easier to use. They add commonly used APIs such as push
, insert
, remove
, etc. These APIs allow us to manipulate array elements more conveniently without writing the code to implement these operations ourselves.
In this chapter, we will use the most primitive static arrays to implement a dynamic array and create common APIs for adding, deleting, querying, and modifying elements. Once you understand this, you'll know the underlying principles of data structures provided by standard libraries.
With dynamic arrays, more complex data structures like queues, stacks, and hash tables, which will be discussed later, all rely on them for implementation.
Static Arrays
When creating a static array, you need to specify the element type and the number of elements in the array. Only languages like C++, Java, and Golang provide ways to create static arrays. Languages like Python and JavaScript do not offer definitions for static arrays.
The usage of static arrays is quite primitive and rarely used in actual software development or algorithm problems. We typically use dynamic arrays directly. However, to understand the principles, it is necessary to explain them here.
The method to define a static array is as follows:
// define a static array of size 10
int arr[10];
// initialize the array values to 0 using memset function
memset(arr, 0, sizeof(arr));
// assign values using index
arr[0] = 1;
arr[1] = 2;
// retrieve values using index
int a = arr[0];
// define a static array of size 10
int[] arr = new int[10];
// assign values using indices
arr[0] = 1;
arr[1] = 2;
// retrieve values using indices
int a = arr[0];
// define a static array of size 10
var arr [10]int
// assign values using index
arr[0] = 1
arr[1] = 2
// retrieve values using index
a := arr[0]
That's it, nothing more.
Let's take C++ as an example. What exactly does the code int arr[10]
do? Here are the main things:
It allocates a contiguous block of memory of size
10 * sizeof(int)
bytes. Anint
takes up 4 bytes in computer memory, so the total is 40 bytes.It defines an array pointer named
arr
that points to the starting address of this memory block.
So, what does the code arr[1] = 2
do? Here are the main steps:
It calculates the offset by adding
1 * sizeof(int)
bytes (4 bytes) to the starting address ofarr
, finding the address of the second element in the memory block.It writes the integer
2
into the 4-byte memory space starting at this address.
For Beginners
I remember when I first started college and had to learn the basics of C language, some fellow students were confused about pointer arrays and array pointers. If you understand the simple process explained below, everything becomes clear.
Why do array indices start from 0? It's to simplify address calculation.
arr[0]
is the address of the first element in the array. The value of the first element is stored in the 4 bytes starting from this address.arr[1]
is the address of the second element, which is the starting address plus1 * 4
bytes. This address stores the value of the second element, and so on forarr[2], arr[3]
, etc.The array name
arr
points to the starting address of the memory block, so the array name itself is a pointer. Accessing the value at this address gives you the value of the first element. In other words,*arr
is the same asarr[0]
, which is the value of the first element.If you do not use functions like
memset
to initialize the array values, the values in the array are uncertain. The statementint arr[10]
simply asks the operating system to allocate a continuous block of memory, and you don't know if this memory was previously used or what strange values it might contain. Hence, we usually use thememset
function to initialize the memory before using it.
The above points are specific to C/C++ because these are the languages most people encounter when learning computer basics. In other languages like Java or Golang, static arrays are automatically initialized to 0, so explicit initialization is not required.
Summary
To summarize the causal logic above, a static array is essentially a block of continuous memory. From the statement int arr[10]
, we can deduce:
- We know the starting address of this memory block (the array name
arr
points to this address). - We know the type of each element (e.g., int), which means we know the size of memory each element occupies (e.g., an int occupies 4 bytes, 32 bits).
- This memory block is continuous and its size is
10 * sizeof(int)
, which is 40 bytes.
Therefore, we obtain the array's superpower of "random access": given any array index, we can directly retrieve the corresponding element's value in O(1)
time.
This is because we can compute the target element's memory address using the starting address and the index. The time complexity of memory addressing in a computer can be considered O(1)
, so the time complexity for random access in an array is O(1)
.
However, the greatest strength of an array is often also its biggest weakness. The continuous memory characteristic that gives an array its random access superpower also brings some challenges, which will be discussed next.
CRUD Operations
The job of a data structure is to perform CRUD operations—Create, Read, Update, and Delete. Nothing more.
Earlier, we discussed the underlying principles of arrays, focusing mainly on the "Read" and "Update" operations, i.e., accessing and modifying elements via their indices. So, how do we handle the "Create" and "Delete" operations?
Create
Adding elements to a static array can be a bit complex and requires different approaches based on the situation.
Scenario 1: Appending Elements to the End of an Array
For example, suppose I have an array of size 10, containing 4 elements, and I want to add another element at the end. How do I do this?
It's quite simple: you just assign the new value to the corresponding index. Here's a rough idea of the code logic:
// An array of size 10 already contains 4 elements
int[] arr = new int[10];
for (int i = 0; i < 4; i++) {
arr[i] = i;
}
// Now want to append an element 4 at the end of the array
arr[4] = 4;
// Then append an element 5 at the end of the array
arr[5] = 5;
// And so on
// ...
// An array of size 10 already has 4 elements
int arr[10];
for (int i = 0; i < 4; i++) {
arr[i] = i;
}
// Now I want to add an element 4 at the end of the array
arr[4] = 4;
// Then add an element 5 at the end of the array
arr[5] = 5;
// And so on
// ...
# An array of size 10 already has 4 elements
arr = [0] * 10
for i in range(4):
arr[i] = i
# Now we want to append an element 4 to the end of the array
arr[4] = 4
# Then append an element 5 to the end of the array
arr[5] = 5
# And so on
# ...
// The array of size 10 already has 4 elements
var arr [10]int
for i := 0; i < 4; i++ {
arr[i] = i
}
// Now we want to append an element 4 at the end of the array
arr[4] = 4
// Then append an element 5 at the end of the array
arr[5] = 5
// And so on
// ...
// An array of size 10 already contains 4 elements
var arr = new Array(10);
for (var i = 0; i < 4; i++) {
arr[i] = i;
}
// Now want to add an element 4 to the end of the array
arr[4] = 4;
// Then add an element 5 to the end of the array
arr[5] = 5;
// And so on
// ...
As you can see, since it's just assigning a value to an index, the time complexity of appending an element to the end of the array is O(1)
.
Case 2: Inserting an Element in the Middle of the Array
For example, I have an array arr
of size 10, with the first 4 indices filled with elements. Now, I want to insert a new element at the 3rd position (arr[2]
). How can we achieve that?
This involves "data shifting" to make room for the new element before it can be inserted. The general code logic is as follows:
// An array of size 10 already has 4 elements
int[] arr = new int[10];
for (int i = 0; i < 4; i++) {
arr[i] = i;
}
// Insert element 666 at the 3rd position
// Need to move the elements from the 3rd position onwards one position back
// Note to traverse the array in reverse to avoid overwriting existing elements, refer to the visual panel below if unclear
for (int i = 4; i > 2; i--) {
arr[i] = arr[i - 1];
}
// Now the 3rd position is open, and we can insert the new element
arr[2] = 666;
// An array of size 10 already has 4 elements
int arr[10];
for (int i = 0; i < 4; i++) {
arr[i] = i;
}
// Insert element 666 at the 3rd position
// Need to move the elements from the 3rd position onwards one step back
// Note that we should traverse the existing elements in the array backwards to avoid overwriting, refer to the visualization panel below if unclear
for (int i = 4; i > 2; i--) {
arr[i] = arr[i - 1];
}
// Now the 3rd position is vacant, we can insert the new element
arr[2] = 666;
# An array of size 10 has already been filled with 4 elements
arr = [0] * 10
for i in range(4):
arr[i] = i
# Insert element 666 at the 3rd position
# Need to move the elements from the 3rd position and onwards one position back
# Note that we should traverse the array in reverse order to avoid overwriting, see the visualization panel below if you don't understand
for i in range(4, 2, -1):
arr[i] = arr[i - 1]
# Now the 3rd position is empty, we can insert the new element
arr[2] = 666
import "fmt"
func main() {
// The array of size 10 already contains 4 elements
var arr [10]int
for i := 0; i < 4; i++ {
arr[i] = i
}
// Insert element 666 at the 3rd position
// Need to move the elements at the 3rd position and after one position back
// Be sure to traverse the existing elements of the array backwards to avoid overwriting, if you don't understand, see the visual panel below
for i := 4; i > 2; i-- {
arr[i] = arr[i-1]
}
// Now the 3rd position is empty, and the new element can be inserted
arr[2] = 666
// Print the array result
fmt.Println(arr)
}
// An array of size 10 already has 4 elements
var arr = new Array(10);
for (var i = 0; i < 4; i++) {
arr[i] = i;
}
// Insert element 666 at the 3rd position
// Need to move the elements from the 3rd position and onwards one position back
// Note that you should iterate the array backwards to avoid overwriting existing elements, refer to the visual panel below if unclear
for (var i = 4; i > 2; i--) {
arr[i] = arr[i - 1];
}
// Now the 3rd position is free, you can insert the new element
arr[2] = 666;
In summary, the time complexity of inserting an element in the middle of an array is O(N)
because it involves shifting data to make space for the new element.
Case 3: Array is Full
When a static array is created, its size must be determined. For example, if I create an array int arr[10]
(a contiguous memory space of 40 bytes) and store 10 elements in it, what should I do if I want to insert another element? Whether you want to append it to the end or insert it in the middle, there is no space left for the new element.
Some readers might say, "This is easy, just add another 4 bytes of contiguous memory space after the 40 bytes to store the new element."
That's not possible. Contiguous memory must be allocated all at once and cannot be arbitrarily increased or decreased afterward. The memory space following your contiguous block might already be occupied by other programs, and you can't just take it because you want to.
So what should you do? You have to allocate a larger block of memory, copy the original data over, and then insert the new element. This is known as the array "expansion" operation.
For example, I can create a larger array int arr[20]
and copy the original 10 elements over. This way, there will be extra space to insert the new element.
The general logic is as follows:
// the array of size 10 is already full
int[] arr = new int[10];
for (int i = 0; i < 10; i++) {
arr[i] = i;
}
// now want to append an element 10 at the end of the array
// need to expand the array first
int[] newArr = new int[20];
// copy the original 10 elements
for (int i = 0; i < 10; i++) {
newArr[i] = arr[i];
}
// the old array's memory space will be handled by the garbage collector
// ...
// append the new element in the new larger array
newArr[10] = 10;
// The array of size 10 is already filled
int arr[10];
for (int i = 0; i < 10; i++) {
arr[i] = i;
}
// Now want to append an element 10 to the end of the array
// Need to expand the array first
int newArr[20];
// Copy the original 10 elements over
for (int i = 0; i < 10; i++) {
newArr[i] = arr[i];
}
// Release the memory space of the old array
// ...
// Append the new element in the new larger array
newArr[10] = 10;
# The array of size 10 is already full
arr = [i for i in range(10)]
# Now want to append an element 10 to the end of the array
# Need to expand the array first
newArr = [0] * 20
# Copy the original 10 elements over
for i in range(10):
newArr[i] = arr[i]
# Free the memory space of the old array
# ...
# Append the new element in the new large array
newArr[10] = 10
// The array of size 10 is already full
arr := make([]int, 10)
for i := 0; i < 10; i++ {
arr[i] = i
}
// Now want to append an element 10 to the end of the array
// Need to expand the array first
newArr := make([]int, 20)
// Copy the original 10 elements over
for i := 0; i < 10; i++ {
newArr[i] = arr[i]
}
// Release the memory space of the old array
// ...
// Append the new element to the new larger array
newArr[10] = 10
// The array of size 10 is already filled
var arr = new Array(10);
for (var i = 0; i < 10; i++) {
arr[i] = i;
}
// Now we want to append an element 10 at the end of the array
// We need to expand the array first
var newArr = new Array(20);
// Copy the original 10 elements over
for (var i = 0; i < 10; i++) {
newArr[i] = arr[i];
}
// Release the memory space of the old array
// ...
// Append the new element in the new large array
newArr[10] = 10;
In summary, the expansion operation of an array involves creating a new array and copying the data, with a time complexity of O(N)
.
Delete
The operation of deleting elements is similar to adding elements and needs to be discussed in different scenarios.
Scenario One: Deleting the Last Element
For example, suppose I have an array of size 10, which contains 5 elements, and I want to delete the last element. What should I do?
It's simple. Just mark the last element with a special value to indicate it is deleted. Here, for simplicity, we'll use -1 as the special value to represent deletion. When we later implement dynamic arrays, there will be more sophisticated methods for deleting array elements. For now, this example is to illustrate that deleting the last element of an array essentially involves a single random access, with a time complexity of O(1)
.
The basic code logic is as follows:
// An array of size 10 already contains 5 elements
int[] arr = new int[10];
for (int i = 0; i < 5; i++) {
arr[i] = i;
}
// Delete the last element, temporarily use -1 to represent the deleted element
arr[4] = -1;
// An array of size 10 already has 5 elements
int arr[10];
for (int i = 0; i < 5; i++) {
arr[i] = i;
}
// Remove the last element, temporarily use -1 to represent the element has been deleted
arr[4] = -1;
# An array of size 10 already contains 5 elements
arr = [0] * 10
for i in range(5):
arr[i] = i
# Remove the last element, temporarily use -1 to represent the deleted element
arr[4] = -1
// The array of size 10 already contains 5 elements
var arr [10]int
for i := 0; i < 5; i++ {
arr[i] = i
}
// Remove the last element, temporarily using -1 to represent the deleted element
arr[4] = -1
// An array of size 10 already contains 5 elements
var arr = new Array(10);
for (var i = 0; i < 5; i++) {
arr[i] = i;
}
// Remove the last element, temporarily use -1 to represent the deleted element
arr[4] = -1;
Case Two: Deleting a Middle Element
For example, imagine I have an array of size 10 containing 5 elements, and I want to delete the 2nd element (arr[1]
). How should I do it?
This involves "data shifting," where you move all elements after the deleted element one position forward to maintain the continuity of the array.
The basic code logic is as follows:
// An array of size 10 already contains 5 elements
int[] arr = new int[10];
for (int i = 0; i < 5; i++) {
arr[i] = i;
}
// delete arr[1]
// need to move all elements after arr[1] one position forward
// note that you should traverse the array forward to avoid overwriting, refer to the visualization panel below if you don't understand
for (int i = 1; i < 4; i++) {
arr[i] = arr[i + 1];
}
// set the last element to -1 to indicate deletion
arr[4] = -1;
// An array of size 10 already contains 5 elements
int arr[10];
for (int i = 0; i < 5; i++) {
arr[i] = i;
}
// Delete arr[1]
// Elements after arr[1] need to be shifted forward by one position
// Make sure to traverse the existing elements of the array in order to avoid overwriting. If unclear, please refer to the visualization panel below.
for (int i = 1; i < 4; i++) {
arr[i] = arr[i + 1];
}
// Set the last element to -1 to indicate deletion
arr[4] = -1;
# An array of size 10 has already been filled with 5 elements
arr = [0] * 10
for i in range(5):
arr[i] = i
# Delete arr[1]
# Need to move all elements after arr[1] one position forward
# Note that we should traverse the existing elements in the array forward to avoid overwriting, see the visualization panel below if you don't understand
for i in range(1, 4):
arr[i] = arr[i + 1]
# Set the last element to -1 to indicate deletion
arr[4] = -1
// an array of size 10 already contains 5 elements
var arr [10]int
for i := 0; i < 5; i++ {
arr[i] = i
}
// delete arr[1]
// need to move all elements after arr[1] one position forward
// be sure to traverse the array of existing elements forward to avoid overwriting, if you don't understand, please see the visualization panel below
for i := 1; i < 4; i++ {
arr[i] = arr[i + 1]
}
// set the last element to -1 to indicate deletion
arr[4] = -1
// An array of size 10 has 5 elements already
var arr = new Array(10);
for (var i = 0; i < 5; i++) {
arr[i] = i;
}
// delete arr[1]
// need to move the elements after arr[1] one position forward
// note that we should traverse the existing elements of the array from left to right to avoid overwriting. If unclear, see the visual panel below
for (var i = 1; i < 4; i++) {
arr[i] = arr[i + 1];
}
// set the last element to -1 to indicate deletion
arr[4] = -1;
In summary, the time complexity of deleting an element in the middle of an array is O(N)
because it involves shifting data.
Summary
To summarize, the time complexities for various operations on a static array are:
- Add:
- Appending an element at the end:
O(1)
. - Inserting an element in the middle (not at the end):
O(N)
.
- Appending an element at the end:
- Delete:
- Deleting the last element:
O(1)
. - Deleting an element in the middle (not at the end):
O(N)
.
- Deleting the last element:
- Access: Given a specific index, retrieving the value of the element at that index has a time complexity of
O(1)
. - Update: Given a specific index, modifying the value of the element at that index has a time complexity of
O(1)
.
Some readers might ask, didn't we discuss the array resizing operation earlier? Resizing involves allocating space for a new array and copying data, which has a time complexity of O(N)
. Why isn't this complexity included in the "Add" operation?
This is a good question. However, not every addition of an element triggers a resize. Therefore, the complexity of resizing should be analyzed using the "amortized time complexity" concept. I have explained this in detail in Methods for Analyzing Time and Space Complexity, so I won't elaborate here.
Another point for beginners to note is that we say the complexity for accessing and updating an array is O(1)
only when a specific index is given. If, on the contrary, you are given a value and asked to find its index in the array, you would need to search the entire array. This has a complexity of O(N)
.
So, it's important to understand the principles rather than just memorizing concepts. Once you understand the principles, you can derive the concepts yourself.
Dynamic Arrays
Previously, we discussed the strengths and limitations of static arrays. Now, let's talk about dynamic arrays. A dynamic array is an enhanced version of a static array and is one of the most commonly used data structures in software development and algorithm problems.
First of all, don't think that dynamic arrays can solve the efficiency issues of inserting or deleting elements in the middle of a static array. That's impossible. The superpower of arrays for random access comes from their continuous memory space, and continuous memory space inevitably faces issues of data relocation and resizing.
The underlying structure of a dynamic array is still a static array. It automatically handles resizing for us and encapsulates the operations of adding, deleting, searching, and modifying elements, making it more convenient to use.
Here are some examples of how to use dynamic arrays in various programming languages:
// create a dynamic array
// no need to explicitly specify array size, it will automatically resize based on the number of elements stored
ArrayList<Integer> arr = new ArrayList<>();
for (int i = 0; i < 10; i++) {
// append elements at the end, time complexity O(1)
arr.add(i);
}
// insert elements in the middle, time complexity O(N)
// insert element 666 at index 2
arr.add(2, 666);
// insert elements at the beginning, time complexity O(N)
arr.add(0, -1);
// remove the last element, time complexity O(1)
arr.remove(arr.size() - 1);
// remove elements in the middle, time complexity O(N)
// remove element at index 2
arr.remove(2);
// query element by index, time complexity O(1)
int a = arr.get(0);
// modify element by index, time complexity O(1)
arr.set(0, 100);
// find index by element value, time complexity O(N)
int index = arr.indexOf(666);
// create a dynamic array
// no need to specify the size explicitly, it will automatically resize based on the number of stored elements
vector<int> arr;
for (int i = 0; i < 10; i++) {
// append elements at the end, time complexity O(1)
arr.push_back(i);
}
// insert elements in the middle, time complexity O(N)
// insert element 666 at index 2
arr.insert(arr.begin() + 2, 666);
// insert elements at the beginning, time complexity O(N)
arr.insert(arr.begin(), -1);
// remove elements from the end, time complexity O(1)
arr.pop_back();
// remove elements from the middle, time complexity O(N)
// remove the element at index 2
arr.erase(arr.begin() + 2);
// query elements by index, time complexity O(1)
int a = arr[0];
// modify elements by index, time complexity O(1)
arr[0] = 100;
// find index by element value, time complexity O(N)
int index = find(arr.begin(), arr.end(), 666) - arr.begin();
# create a dynamic array
# no need to explicitly specify the array size, it will automatically expand and shrink based on the actual number of stored elements
arr = []
for i in range(10):
# append elements to the end, time complexity O(1)
arr.append(i)
# insert elements in the middle, time complexity O(N)
# insert element 666 at index 2
arr.insert(2, 666)
# insert elements at the beginning, time complexity O(N)
arr.insert(0, -1)
# delete the last element, time complexity O(1)
arr.pop()
# delete elements in the middle, time complexity O(N)
# delete the element at index 2
arr.pop(2)
# query elements by index, time complexity O(1)
a = arr[0]
# modify elements by index, time complexity O(1)
arr[0] = 100
# find index by element value, time complexity O(N)
index = arr.index(666)
import (
"fmt"
)
func main() {
// create a dynamic array
// no need to explicitly specify array size, it will auto expand/contract based on the actual number of elements
arr := make([]int, 0)
for i := 0; i < 10; i++ {
// append elements to the end, time complexity O(1)
arr = append(arr, i)
}
// insert elements in the middle, time complexity O(N)
// insert element 666 at index 2
arr = append(arr[:2], append([]int{666}, arr[2:]...)...)
// insert elements at the beginning, time complexity O(N)
arr = append([]int{-1}, arr...)
// delete the last element, time complexity O(1)
arr = arr[:len(arr)-1]
// delete elements in the middle, time complexity O(N)
// delete the element at index 2
arr = append(arr[:2], arr[3:]...)
// query element by index, time complexity O(1)
a := arr[0]
// modify element by index, time complexity O(1)
arr[0] = 100
// find index by element value, time complexity O(N)
index := -1
for i, v := range arr {
if v == 666 {
index = i
break
}
}
}
// create a dynamic array
// No need to explicitly specify the array size, it will automatically resize based on the number of elements stored
var arr = [];
for (var i = 0; i < 10; i++) {
// Append elements to the end, time complexity O(1)
arr.push(i);
}
// Insert elements in the middle, time complexity O(N)
// Insert element 666 at index 2
arr.splice(2, 0, 666);
// Insert elements at the beginning, time complexity O(N)
arr.unshift(-1);
// Remove the last element, time complexity O(1)
arr.pop();
// Remove elements in the middle, time complexity O(N)
// Remove the element at index 2
arr.splice(2, 1);
// Query elements by index, time complexity O(1)
var a = arr[0];
// Modify elements by index, time complexity O(1)
arr[0] = 100;
// Find index by element value, time complexity O(N)
var index = arr.indexOf(666);
In the following sections, I will guide you step-by-step to implement a dynamic array, helping you gain a deeper understanding of the principles behind dynamic arrays.