Evolution of Collection Types in Swifts

Understand the protocols that make Arrays, Dictionaries, and Sets work

13 min readDec 17, 2020

The three primary collection types in Swift — Array, Dictionary and Set — need no introduction. But if you’ve ever explored protocol conformance hierarchy for any of them, you may have noticed that it may be quite long and seemingly overcomplicated. For example, for Array, it may look like following (due to multiple protocol conformances, there can be multiple paths, and this is not the longest one):

Sequence <- Collection <- MutableCollection <- Array

Most of the methods and properties available for Array are spread out over those protocols. But what is the point of having so many stages of ripening of Array’s functionality? Let’s start from the beginning and figure out which new semantics and new API are brought in with each of those stages.

Sequence

Being the root protocol for all collection types,Sequence is, so to speak, a proto-collection, which contains only basic rudiments to grow into something more. Pretty much as its name suggests, Sequence models a list of values that you can sequentially step through, one at a time. The only method any Sequence must implement is makeIterator(), which returns an instance of IteratorProtocol — the associated type of Sequence protocol.

IteratorProtocol in turn has the only required method, next(), returning the next element each time it gets called or nil if no next element exists. By the way, you can make your custom type conform to both Sequence and IteratorProtocol. Then you only have to implement next(), while makeIterator() would be inferred automatically. This really simple mechanism is exactly what a for-in loop exploits under the hood. So if you are for-looping over something, this something is a Sequence. for-in syntax is not necessary, of course; you can well use iterators explicitly.

So, strictly speaking, Sequence does not necessarily contain some list of elements; rather, it represents some abstract ability to generate them sequentially on the fly. For example, you can create the sequence OddNumbers whose Element is Int. The first element would be 1, and each call of next() would increment the previous one by 2 and return it. Working with this basic sequence also reminds me of reading from a file in some old-fashioned environment. You read it bite by bite until you stumble upon the end-of-file symbol, which is an analogue of calling iterator.next() until it returnsnil.

While these basics of Sequence seem very simple, they are just enough to develop into a rich and flourishing API, available for any sequence for free. The standard library provides default implementations for a large number of useful operations, starting from searching (contains(), first(where:), etc.) and ending with filter, map and reduce— those pillars of functional programming. You can also shuffle, reverse, join, split your sequence, drop its elements, extract prefixes and suffixes, and many many more. And of course now with Combine, you can get a Publisher for your sequence for free.

There is an important disclaimer about the Sequence protocol: There is no requirement for whether it will be destructively consumed by iteration. Meaning, if you’ve already performed a for-in loop over an instance of the sequence, and you then start another one over the same instance, you can’t be sure whether it will resume iterating, or start it over, or do nothing more — the behaviour you get is just undefined. If you do need such non-destructive iteration, Collection is here to serve you, and we’ll get back to this topic soon.

Collection

Collection protocol inherits from Sequence, adding the requirement that any element can be accessed directly by its position, which secures more elaborate and controlled management of its elements. Semantically, collection models a pre-defined, finite list of elements, each stuck to a certain position. It doesn’t just have an ability to iterate, as Sequence does.

To give you another illustration, Sequence can be viewed as a stack of paperwork on your desk that you can only go through from top to bottom, checking out each item until you reach what you’re looking for. In contrast, Collection is something that reflects some expanding file folder, with labels on each pocket, bookmarks, and even a table of contents for faster reference.

For that purpose, Collection introduces the concept of indices. It declares startIndex and endIndex properties to define the indices bounds, and the index(after:) method for advancing the index. Note that those methods make the indexing work for and arbitrary type, not necessarily for integers. Index is the associated type of the Collection protocol, and the only restriction for type taking its place is to be Comparable. Finally, there is at least a read-only subscript used for accessing the element at a given index. And it can be considered the cornerstone of Collection’s functionality.

In addition, Collection declares and provides default implementations for the next tranche of cool out-of-the-box APIs that take advantage of indexing. It’s not only accessing by index but also prefixing, suffixing, slicing, determining the distance between indices, etc. Even count first appears in Collection. For Sequence, we have only underestimatedCount, which is by definition less than or equal to the estimated number of elements, and by default it is zero. ’Cause you never know when your iterator’s next() is going to return its final nil.

Having indexed access naturally implies another requirement that brings Collection to the next level as compared to Sequence. Unlike what we have for Sequence, types conforming to Collection must guarantee that each time you traverse it with a for-in loop (or using an iterator), iteration will start over and you’ll be getting the same results as the previous time (of course, unless you didn’t modify it in between these two loops).

In this example, the first for-in loop has not consumed the first two elements of the array. So when you relaunch the iterating, it will start from the beginning. Remember that for Sequence, the behaviour of the second loop is just not defined. Also notice another way of iterating over Collection with the use of index and subscript, in addition to the basic one inherited from Sequence.

Let me stay a little more on this deep connection between the ability for direct access by index and the requirement of nondestructive, repeatable traversing. Technically, some basic indexed access is applicable not only to Collection but to Sequence as well. You can declare a separate counter that you increment and check on each iteration of for-in and trigger logic you need once it reached your desired index value.

But not to mention how awkward it is, this is hardly usable for real cases at all, and that’s just because Sequence by default can’t be repeatedly traversed. Normally you use an index either if it was saved at some previous point, resorting to your Collection, or to save the index for future access to the same element. So this is at least a two-step process — save the index and use the index, in any order — which presumes that your list of objects is durable and stable enough to be iterated over repeatedly. Otherwise, there is no reason for indexing it, even if technically you can.

Collection Types

So far we mostly talked about protocol structure and meaning, but let’s finally turn to concrete collection types. Now with all that sacred knowledge about collection protocols, you should be able to create your custom type, but you probably won’t. The standard ones (Array, Dictionary, and Set) are pretty exhaustive, well optimised, and just enough in 99.9 of cases. Let’s see how exactly they are inscribed into the complete collection protocols hierarchy.

Collection types family tree

As you can see now and as mentioned in the intro, Array is connected with the Collection protocol through the number of transitional forms. In this article, we’re not going to focus on all of them, but if you wish to go into it deeper, you may check out this great article by Ole Begemann. But here let’s highlight at least some interesting things.

As you can see from the diagram above, one of the conformances of Array is the MutableCollection protocol. Well, it doesn’t seem counterintuitive at all; of course Array is mutable! But if your first association was methods like append(), or insert(), or remove(), this is not the case. The thing is, documentation of the MutableCollection protocol postulates that changing values of its elements should not affect the length of Collection. But obviously, append(), insert(), and remove() do, so they don’t fall within this protocol. They’re declared and implemented in Array itself. What about MutableCollection? It is mostly about having write access for subscript, in addition to the read-only access inherited from Collection.

That is also why Set, being actually mutable, bypasses MutableCollection and immediately conforms to Collection. But what about Dictionary? It does have a subscript that sets the value for the key, replacing the previous one and thus not affecting the size. What is wrong with it, then?

Well, here comes another subtlety, and it’s time for a more careful look at the Index of the dictionary. This is a bit tangled at first sight, but Key in Dictionary has nothing to do with Collection.Index, just like Value is completely unrelated to Sequence.Element. So if you declare, let’s say, Dictionary<Int, String>, its Key would be Int, its Value is String, and its Element is the tuple (key: Int, value: String).

And what about Index? It is a separate struct, internally declared in the dictionary, in our caseDictionary<Int, String>.Index. This struct basically holds all that hashing-related stuff that the internal implementation of Dictionary relies on in order to be a hash table. By the way, something similar also applies to Set.Index, which we also rarely use directly.

So, key and value are something specific to Dictionary, and they belong to its implementation, not to that of the Collection protocol. Hence, the widely used get-set subscript(Key)->Value? of Dictionary is not in any way related to the read-only subscript(Index) -> Element, required by the Collection protocol. Of course, the latter is also still in play for Dictionar, and you can well use it (as well as all other operations with Index inherited from Collection). But that’s not what we normally do, because Key and Value are here to make our lives easier. Here’s a short code snippet to summarise all this:

By the way, if you were ever wondering why the subscript of Dictionary is optional while the apparently equivalent one of Array is not, you’ve got your answer — they are not equivalent.

Sequence vs Collection — Some More Variations on a Theme

If you’re not yet overburdened with the above, let me share some more speculations on similarities and differences between the Sequence and Collection protocols. Hopefully you’ll have some practical takeaways even from the vague allegations in the section that follows.

Being different semantically, these protocols also provide operations of a somewhat different nature. This creates a certain division of roles between them among the APIs of each concrete collection type that conforms to both protocols. Feeling which protocol some method or property is more inherent to may be helpful when it comes to designing your own API that involves collections. Although in most real-life cases you are just okay with specifying a concrete type, sometimes you do have to be less restrictive and put Collection, or even less restrictive and use Sequence, depending on which behaviour or portion of their API is expected or necessary in this particular scope. After all, keeping your API as generic as possible is a good practice, and it would never harm if used wisely.

So the general rule of thumb, as you may have already guessed, would be that Collection is primarily used when you have some knowledge of certain elements’ positions or need to gain this knowledge for future use, whereas Sequence is just enough if you’re okay to flip through all of them from the beginning.

But here are some further reflections. Notice that the most typical operations over Sequence cover the cases where you deal with the multitude of its elements as a whole, like converting it all with map, digesting it all with reduce, boiling it down with filter, etc. Even when you search for an individual element of Sequence (e.g., with first(where:)), internally it still tests all of them one by one, so it generally still interacts with them as a list.

To put it mathematically (just in case you also like making simple things complicated), such operations are effectively the functions in which input is a list of elements and output is also a list of elements. It’s just a particular case that the result may happen to be an empty list, or that it is forcibly trimmed to be just one element. For example, first(where:), the primary means of accessing a certain element in the sequence, is just a trimmed version of filter(where:). It drops all but the first element of the results of filtering or returns nil, if filtering resulted in an empty list. And, let’s say, contains(where:) is just another way of saying whether the size of the list of elements retrieved is more than zero.

By contrast, Collection is fundamentally aimed at giving an unambiguous and guaranteed result. The indexed subscript, the heart of the API of Collection, provides one and only one element. If there is no such single element, it crashes. So it is not optional, unlike most of the semantically corresponding methods of Sequence. This represents the mathematical idea of projection from a list of elements to a single and obligatory one. Again, this is not a strict rule or definition but more like trying to find some patterns and regularities.

Another distinction to be mentioned again is that only with Collection do you first get the ability to know the number of its elements. It’s not possible with Sequence because you never know beforehand when the iteration over it will reach the last element. For the same reason, be aware that Sequence is conceptually infinite, so both traversing it and using most of its methods (that internally still involve traversing) are not quite safe. The next() method of its iterator may keep returning a non-nil value till the end of time. This is not something you’re likely to run into when using sequences from a standard library, but when dealing with an unknown sequence, you can’t be 100% sure that you won’t get stuck inside of some method.

It looks like the only methods you can rely on are those dealing with a finite prefix (like dropFirst(Int) and prefix(Int)). By the way, not knowing the size of a sequence also prevents you from accessing the last element and any logic that involves a suffix. (Well, frankly speaking, there are still suffix() and dropLast() methods in Sequence, but they require deliberate restriction of the receiver to be finite and are somewhat dissonant here theoretically).

And the last thing to mention for today is performance. Unless documented otherwise, Sequence must provide its elements no worse than in O(n), while Collection must make it to the top-notch O(1), which is another win of having indexed access. However, the great law of performance-memory trade-off predicts that Collection is more expensive in terms of memory. Of course, it all depends on concrete implementations, but as a concept — yes it is, and that comes from the requirement of repeated access. It’s kind of a loose analogy, but Collection can be considered more state-oriented (indeed, dealing with index implies saving the state), while Sequence is more in tune with a functional approach.

Whew, that’s all I wanted to tell you. If you followed this far, I should be really proud for keeping your attention for that long! Share your thoughts and questions in the comments. Thanks for reading, and see you next time