7 Useful Data Structures Every Java Programmer Should Know
Data Structures: Powerfully Simple, Simply Powerful
If I had to pick the single most important topic that I’ve learned whilst digging down this computer science rabbit hole, it would be data structures and algorithms (DSA). They’re super-duper important once you get lower level in terms of abstraction and are quite important if you do things at scale.
Everything is DSA. Why are you storing your data in a list instead of a dictionary? Why are you indexing your database? Why did you bail out of a string match when the first characters didn’t match?
Either you’re using DSA, or you can’t answer any of these questions about your own code. Here are some of the most useful data structures I believe every Java programmer should know and add to their toolset.
I’m gonna skip over Arrays and ArrayList (you should already know)
The LinkedList class is almost identical to the ArrayList. The LinkedList class has all of the same methods as the ArrayList class because they both implement the List interface. This means that you can add items, change items, remove items and clear the list in the same way.
The LinkedList stores its items in “containers.” The list has a link to the first container and each container has a link to the next container in the list.
LinkedList into detail here.
Hashmaps store items in key/value pairs (For the snake fanatics, it’s basically a dictionary)
HashMap is known as HashMap because it uses a technique called Hashing -
the technique of converting a large String to a small String that represents the same String. A shorter value helps in indexing and faster searches.
In ArrayLists, you learned that Arrays store items as an ordered collection, and you have to access them with an index number. A Hashmap, however, stores items in “key/value” pairs where you can access the ‘value’ if you provide the ‘key’. Hashmaps can store different types:
- (String keys & Integer vals) or the same type (String keys & String vals)
HashMap into detail here.
3. Multidimensional Arrays
Don’t be alarmed by the name, they are simply arrays that contain other arrays; so like a nested array (much like nested conditionals)
The simplest of the multi-dimensional array is a two-dimensional array; an array of one-dimensional arrays. In a matrix, the two dimensions are represented by rows and columns. (ie. int nums[#rows][#columns])
Multidimensional arrays are an extension of 2-D matrices and use additional subscripts for indexing. A 3-D array, for example, uses three subscripts. The first two are just like a matrix, but the third dimension represents pages or sheets of elements.
(import java.util.Queue;) and (import java.util.LinkedList;)
Java Queues order elements in a FIFO (First In First Out) discipline. In FIFO, the first element is removed first and the last element is removed at last. By convention, we name the queue insert operation enqueue and the remove operation dequeue
To help visualize Queues, think about a line at backyard bbq (or anything where people line up for something); you give bbq to the person in front of the line. After they’ve been served, they leave the line. Queues work in just that way!
Queues into detail here.
Stacks are an abstract data type with a bounded- or predefined- capacity.
To help visualize Stacks, think about it as a stack of video games; let’s say 5 games: Pokemon Red, Blue, Yellow, Diamond, and Pearl, and you want to organize them in the form of a stack (i.e. putting them one above the other), then how would you do it?
Well, you would take the first game, Red, and on top of it, you will put the second game, Blue, and then the next game, and so on, until you put the last game on, i.e. Pearl. This whole thing forms a stack of games which looks kinda sorta something like this:
A stack is a dynamic data set in which elements use the FILO (First-In-Last-Out) principle to create data structure — this structure limits data in the way that it can only be added to or removed from the top.
The delete operation is known as POP, in relation to the physical idea of popping something from a stack — the FILO structure also means we can only remove the item that was most recently added. In order to add items to a stack, the PUSH operation can be used, which refers to the physical idea of pushing an item onto the top of a stack.
Stacks into detail here.
A Tree is a non-linear data structure that organizes data in a hierarchical way. You can liken them to a family tree with many generations; grandparents, parents, children, etc. Family trees are organized hierarchically.
The tree data structure is a collection of nodes. These nodes are connected to each other by edges. Each node contains a value (data) and may or may not have a child node. In every (non-empty) tree, the first node of the tree is called the root.
If this root node is connected to one or more nodes, it is the parent node and the connected nodes are its children. Nodes with no children are called leaves or external nodes. Nodes that are not leaves are called internal nodes. Internal nodes have at least one child. Nodes with the same parent are said to be siblings.
In the above tree:
- A is a parent of B and C.
- B is called a child of A.
- B and C are siblings.
- E, F, H, I, and J are leaves.
The depth of a node is the number of edges from the root to the node. (The length of the path to its root). In the tree above, the depth of node I is 3.
The height of a node is the number of edges from the node to the deepest leaf. (The length of the longest path to a leaf). The height of node B is 2.
Trees into detail here.
A Heap is a special Tree-based data structure in which the tree is a complete binary tree. Generally, Heaps can be of two types:
- Min-Heap: every single parent node, including the root, is less than or equal to the value of its children nodes. The node with the smallest, or minimum value, will always be the root node.
- Max-Heap: essentially the opposite of a min-heap; every parent node, including the root, is greater than or equal to the value of its children nodes. The node with the largest, or maximum value will always be at the root node.
This wasn’t supposed to be a very in-depth technical overview of each DSA cuz it would just take too long and it’d be like a 30-minute read. If I’ve made any mistakes above- or perhaps overlooked an integral data structure, just shoot me an email or drop a comment!