In the third part of my blog series on the C# programming language, I will go through the various types of collections. I use the term “collections” because it is a broad term and the different types of collections are explained below. One advantage of C# is not just the varying types of collections but also the number of ways to work with them.
Built-in Classes
Below are the different namespaces and types of collections that are built into C#. Because there are so many of them, I’m not going into a lot of details with them. There are some notes about them, including their performance and inner workings. If you want to learn more details about them, I suggest looking at the Microsoft Developer Network (MSDN), which contains extensive information on them.
System.Collections Namespace
The collections that are defined in the System.Collections namespace were superseded in C# v2.0 (with the introduction of generic types) by the System.Collections.Generic namespace and are not recommended for a couple different reasons. The first is they are not type safe, meaning the data stored in them can be any type and you will need to take additional measures to make sure it is the right type when retrieving them. For these to work with LINQ you will need to cast them all to a specific type first, which will slow down performance.
- ArrayList
- Equivalent of System.Collections.Generic.List<Object>
- Hashtable
- Equivalent of System.Collections.Generic.Dictionary<Object,Object>
- Queue
- Equivalent of System.Collections.Generic.Queue<Object>
- Stack
- Equivalent of System.Collections.Generic.Stack<Object>
System.Collections.Generic Namespace
If you’re wondering about Big O Notation for the collections in this namespace, I included them below. Out of all of them, the worst is O(n) and average is O(1). None of these are thread safe, meaning you shouldn’t be accessing them from multiple threads at the same time. Instead, use a lock
or (if possible) a System.Collections.Concurrent class.
- List<T>
- Uses fixed array where the size (or capacity) doubles when it is full
- Capacity is set to 0 (if not set in constructor) initially and then to 4 when the first item is added
- Worst O notation is O(n), average is O(1)
- Dictionary<TKey,TValue>
- Uses hash table as data structure
- Determines “bucket” to store value using GetHashCode method of the key (every Object has GetHashCode method)
- Worst O notation is O(n), average is O(1)
- SortedList<TKey,TValue>
- Uses binary search (which is O(log n)) for retrieval
- Insert and delete is O(n) which is slower than O(log n) in SortedDictionary<TKey,TValue>
- Uses less memory than SortedDictionary<TKey,TValue>
- Maintains a sorted array, meaning, the internal array is rearranged each time something is added.
- SortedDictionary<TKey,TValue>
- Insert and delete is O(log n) which is faster than O(n) in SortedList<TKey,TValue>
- Utilizes binary search tree as data structure
- Retrieval is also O(log n)
- Slower than SortedList<TKey,TValue> when list is populated all at once from sorted array
- LinkedList<T>
- Represents a doubly linked list
- Each node is represented by LinkedListNode<T> class
- Worst O notation is O(n), average is O(1)
- Queue<T>
- Stores data just like in a List or ArrayList
- Peek: Gets object at front of the queue but doesn’t remove it
- Dequeue: Gets object at front of the queue and removes it
- Enqueue: Adds object to the end of the queue
- Worst O notation is O(n), average is O(1)
- Stack<T>
- Stores data just like in a List or ArrayList
- Peek: Gets object in front of stack but doesn’t remove it
- Pop: Gets object in front of stack and removes it
- Push: Adds object to top of stack
- Worst O notation is O(n), average is O(1)
Fixed length arrays
Fixed length arrays are simple. They are zero indexed (meaning the index starts at 0) and can be multi-dimensional (meaning you can have arrays inside of another array). C# contains a few built-in methods for manipulating arrays. There is no real restriction on the type of data that an array can hold.
Fixed length arrays are much like objects, in that they’re passed to methods as references and if the array is modified in the called method, it will also be modified in the calling method. Also, the data in an array can be set when it is declared by enclosing the initial data in curly braces. For example:
[highlight lang=”cs”]using System;
public class Test
{
public static void Main()
{
int[] ary = new int[] { 1, 2, 3, 4 };
Console.WriteLine(ary[0]); // Outputs 1
Method(ary);
Console.WriteLine(ary[0]); // Outputs 5
}
private static void Method(int[] ary)
{
ary[0] = 5;
}
}
[/highlight]
Indexer
It’s not just arrays that can you can use square brackets ([]
) on to get an element at a certain position. Defining an indexer (as shown below) allows a class to be used as a array. You are also not limited to using just integers as the index. If you look below, you will see that they are defined just like properties and the visibility can be set at the property or get/set level.
[highlight lang=”cs”]using System;
using System.Collections.Generic;
public class HockeyTeam
{
private readonly Dictionary<string, int> _players = new Dictionary<string, int>();
public int this[string str]
{
get { return _players[str]; }
private set { _players[str] = value; }
}
public HockeyTeam()
{
this[“Wayne Gretzkey”] = 99;
this[“Mario Lemieux”] = 66;
this[“Jarome Iginla”] = 88;
this[“Bobby Orr”] = 4;
this[“Bugs Bunny”] = 22;
this[“Donald Duck”] = 11;
this[“Daffy Duck”] = 54;
this[“Marvin The Martian”] = 26;
}
public static void Main()
{
HockeyTeam hockeyTeam = new HockeyTeam();
string player = “Bobby Orr”;
int number = hockeyTeam[player];
// Outputs “The number for Bobby Orr is 4.”
Console.WriteLine($”The number for {player} is {number}.”);
}
}
[/highlight]
The “yield return” keyword
When you have a method that returns a collection (specifically, one that implements IEnumerable
), you would probably do something like this:
[highlight lang=”cs”]public IEnumerable<int> TrafficLights() {
int[] ary = new int[5];
for (int i = 0; i < 5; i++) {
ary[i] = i * 2;
}
return ary;
}
[/highlight]
With the “yield return”, the number of lines of code can be reduced and performance improved by having the elements returned one at a time (rather than all at once). The method below does the same as the one above.
[highlight lang=”cs”]public IEnumerable<int> Roundabout()
{
for (int i = 0; i < 5; i++)
{
yield return i * 2;
}
}
[/highlight]
I called one method “TrafficLights” and the other “Roundabout” because that’s the best way to compare them. When there’s a line of cars stopped at traffic lights and it turns green, the cars will all go at once until it turns red. Roundabouts are more effience because cars can continously enter it and if a car wants to enter a roundabout but another car is in it, it must yield to that car. A couple restriction to note is:
- None of parameters for the method can use
ref
orout
when usingyield
. - You can’t use
yield
inside of anonymous methods. - A try-catch block cannot contain a
yield return
, with the exception of a try-finally block where it can only be used in the try block. - A try or catch block can contain a
yield break
but not a finally block.
One other concept to add to this is yield break
. This tells the method to exit and go back to the calling method. The calling method receives anything that the called added using yield return
. Below is an example that will stop after 2 iterations.
[highlight lang=”cs”]public IEnumerable<int> Roundabout()
{
for (int i = 0; i < 5; i++)
{
if (i == 2)
yield break;
yield return i * 2;
}
}
[/highlight]
Language-Integrated Query (LINQ)
LINQ was introduced in C# v3.0 and is also a great way of working with collections using less lines of code. If you’re familiar with SQL, LINQ shouldn’t be much harder to figure out. In order to use LINQ, you will need to add the namespaces with using System.Linq;
at the top of the file.
Anonymous methods
Before going into LINQ, it’s best to have an understanding of anonymous methods first. It’ll help you understand the LINQ methods a bit better. If you want to learn more about it, the best examples I could find were in the answers of this question on StackOverflow.
Methods
There are many extensible methods that can be used with collections. They’re refered to as “extensible” because the methods are added to classes that implement IEnumerable<T>
from another class. Most of the methods will return a new collection (which allows them to be chained). In some cases a bool
, int
, long
, or single object/element will be returned.
I’ve included a list of some of the extensible methods and what they do. There are so many that I couldn’t list them all. Besides this list, the MSDN contains a complete list of all the methods. Don’t get turned off by the big long list of methods and their parameters.
- Any: If the filter specified is ever true for the collection, this returns true. When there’s no filter, it returns true if there are elements in the collection (or false if there isn’t).
- All: Similar to Any but only true if the filter evaluates to true for every element in the collection.
- Contains: Returns true if the specified element exists in the collection
- Count/LongCount: A filter can optionally be specified. If one is specified, the number of elements where the filter is true is returned, otherwise, the total number of elements is returned.
LongCount
returns the count aslong
whileCount
returns the count asint
. UseAny
to check if there are (or aren’t) more than 0 elements instead! - Distinct: Returns a new collection of elements without any duplicates. By default, an element is determined to be a duplicate if
obj1.Equals(obj2)
evaluates to true. - DefaultIfEmpty: If a collection is empty, a new collection is returned with one element containing the default of the type of value. You also can specify the object to be in the list if it’s empty.
- ElementAt/ElementAtOrDefault: Gets the element at the specified index and throws an exception if the index is out of range and ElementAtOrDefault is not used (in which the specified object will be returned instead).
- First/Last: Gets the first or last element from the collection (respectively). If a filter is specified the first or last element from the filtered collection is returned. First and Last will return the same element if there’s only 1 element. An exception is thrown if there’s 0 elements.
- FirstOrDefault/LastOrDefault: Same as First/Last but returns default value if there’s 0 elements.
- Single/SingleOrDefault: Gets one element from the collection ONLY if there’s one element in it. Single throws an exception while SingleOrDefault returns the specified default value if there’s less than or greater than 1 element in the collection.
- GroupBy: Goes through collection to find duplicates (which can be using a filter you specify) and returns a collection with a key and value. The key is the object that was used to find duplicates of (for example, a persons age) and the value is a list of objects that are unique (for example, a persons name).
- Where: Runs through collection once and returns collection of elements where filter is true
- TakeWhile/SkipWhile: Useful for when the filter can change while a
foreach
is executing. The same as havingif (condition) break;
at the beginning or end of the loop. - Average/Min/Max/Sum: Goes through list of numbers to find average, lowest, highest, or sum of all the numbers.
- OrderBy/OrderByDescending: Uses the member specified from each element and then
CompareTo
on each of those members to sort the collection in ascending or descending order. The quicksort sorting algorithm is used.
Queries
Everyone has their own taste and my preference is methods. I find queries more complicated and anything you do using queries, can also be done using extensible methods. Like SELECT
statements in SQL, you have a from
and select
. However, the from
goes first and the select
goes last in a LINQ query. When deciding whether to use methods or queries, others have suggested to go with what one is more readable but in my opinion, both are readable. The next thing to do is see which one would be easier to maintain.
Deferred Execution
Pretty much all of the LINQ functions (extensible or query) use what is called “deferred execution”. What this means is nothing is actually done until the return value is accessed (creating a improvement in performance). In the same example below, the first use takes 0 milliseconds while the next takes about 1 millisecond. If the list was made bigger, the latter would take longer while the former would still take 0 milliseconds.
[highlight lang=”cs”]using System;
using System.Linq;
using System.Diagnostics;
public class Program
{
public static void Main()
{
var names = new[] {
“John Smith”,
“Bugs Bunny”,
“Jane Doe”,
“Donald Duck”,
“Marvin The Martian”,
“Scooby Doo”,
“Daffy Duck”,
“Mickey Mouse”,
“Goofy”,
“Popeye”,
“Bart Simpson”,
“Superman”,
“Batman”
};
var stopWatch = new Stopwatch();
stopWatch.Start();
names.OrderBy(name => name); // First use
stopWatch.Stop();
Console.WriteLine(“Time took: {0} ms”, stopWatch.ElapsedMilliseconds);
stopWatch.Start();
foreach (var name in names.OrderBy(name => name)) // Second use
Console.WriteLine(name);
stopWatch.Stop();
Console.WriteLine(“Time took: {0} ms”, stopWatch.ElapsedMilliseconds);
}
}
[/highlight]
Further Reading
Since there’s so much to learn about LINQ, there a few resources you can look at. The first is this article on MSDN. Tutorials Point has a walkthrough as well for LINQ. ReSharper is an interesting tool for Visual Studio that can convert loops into a LINQ statement (sometimes making multiple lines of code into just one).
Stay tuned for my final blog post on the C# programming language!