Evolution of Collection Types in Swifts
Understand the protocols that make Arrays, Dictionaries, and Sets work
The three primary collection types in Swift —
Set — need no introduction. But if you’ve ever explored protocol conformance hierarchy for any of them, you may have noticed that it may be quite long and seemingly overcomplicated. For example, for
Array, it may look like following (due to multiple protocol conformances, there can be multiple paths, and this is not the longest one):
Sequence <- Collection <- MutableCollection <- Array
Most of the methods and properties available for
Array are spread out over those protocols. But what is the point of having so many stages of ripening of
Array’s functionality? Let’s start from the beginning and figure out which new semantics and new API are brought in with each of those stages.
Being the root protocol for all collection types,
Sequence is, so to speak, a proto-collection, which contains only basic rudiments to grow into something more. Pretty much as its name suggests,
Sequence models a list of values that you can sequentially step through, one at a time. The only method any
Sequence must implement is
makeIterator(), which returns an instance of
IteratorProtocol — the associated type of
IteratorProtocol in turn has the only required method,
next(), returning the next element each time it gets called or nil if no next element exists. By the way, you can make your custom type conform to both
IteratorProtocol. Then you only have to implement
makeIterator() would be inferred automatically. This really simple mechanism is exactly what a
for-in loop exploits under the hood. So if you are
for-looping over something, this something is a
for-in syntax is not necessary, of course; you can well use iterators explicitly.
So, strictly speaking,
Sequence does not necessarily contain some list of elements; rather, it represents some abstract ability to generate them sequentially on the fly. For example, you can create the sequence
Int. The first element would be 1, and each call of
next() would increment the previous one by 2 and return it. Working with this basic sequence also reminds me of reading from a file in some old-fashioned environment. You read it bite by bite until you stumble upon the end-of-file symbol, which is an analogue of calling
iterator.next() until it returns
While these basics of
Sequence seem very simple, they are just enough to develop into a rich and flourishing API, available for any sequence for free. The standard library provides default implementations for a large number of useful operations, starting from searching (
first(where:), etc.) and ending with
reduce— those pillars of functional programming. You can also shuffle, reverse, join, split your sequence, drop its elements, extract prefixes and suffixes, and many many more. And of course now with Combine, you can get a
Publisher for your sequence for free.
There is an important disclaimer about the
Sequence protocol: There is no requirement for whether it will be destructively consumed by iteration. Meaning, if you’ve already performed a
for-in loop over an instance of the sequence, and you then start another one over the same instance, you can’t be sure whether it will resume iterating, or start it over, or do nothing more — the behaviour you get is just undefined. If you do need such non-destructive iteration,
Collection is here to serve you, and we’ll get back to this topic soon.
Collection protocol inherits from
Sequence, adding the requirement that any element can be accessed directly by its position, which secures more elaborate and controlled management of its elements. Semantically, collection models a pre-defined, finite list of elements, each stuck to a certain position. It doesn’t just have an ability to iterate, as
To give you another illustration,
Sequence can be viewed as a stack of paperwork on your desk that you can only go through from top to bottom, checking out each item until you reach what you’re looking for. In contrast,
Collection is something that reflects some expanding file folder, with labels on each pocket, bookmarks, and even a table of contents for faster reference.
For that purpose,
Collection introduces the concept of indices. It declares
endIndex properties to define the indices bounds, and the
index(after:) method for advancing the index. Note that those methods make the indexing work for and arbitrary type, not necessarily for integers.
Index is the associated type of the
Collection protocol, and the only restriction for type taking its place is to be
Comparable. Finally, there is at least a read-only subscript used for accessing the element at a given index. And it can be considered the cornerstone of
Collection declares and provides default implementations for the next tranche of cool out-of-the-box APIs that take advantage of indexing. It’s not only accessing by index but also prefixing, suffixing, slicing, determining the distance between indices, etc. Even
count first appears in
Sequence, we have only
underestimatedCount, which is by definition less than or equal to the estimated number of elements, and by default it is zero. ’Cause you never know when your iterator’s
next() is going to return its final
Having indexed access naturally implies another requirement that brings
Collection to the next level as compared to
Sequence. Unlike what we have for
Sequence, types conforming to
Collection must guarantee that each time you traverse it with a
for-in loop (or using an iterator), iteration will start over and you’ll be getting the same results as the previous time (of course, unless you didn’t modify it in between these two loops).
In this example, the first
for-in loop has not consumed the first two elements of the array. So when you relaunch the iterating, it will start from the beginning. Remember that for
Sequence, the behaviour of the second loop is just not defined. Also notice another way of iterating over
Collection with the use of index and subscript, in addition to the basic one inherited from
Let me stay a little more on this deep connection between the ability for direct access by index and the requirement of nondestructive, repeatable traversing. Technically, some basic indexed access is applicable not only to
Collection but to
Sequence as well. You can declare a separate counter that you increment and check on each iteration of
for-in and trigger logic you need once it reached your desired index value.
But not to mention how awkward it is, this is hardly usable for real cases at all, and that’s just because
Sequence by default can’t be repeatedly traversed. Normally you use an index either if it was saved at some previous point, resorting to your
Collection, or to save the index for future access to the same element. So this is at least a two-step process — save the index and use the index, in any order — which presumes that your list of objects is durable and stable enough to be iterated over repeatedly. Otherwise, there is no reason for indexing it, even if technically you can.
So far we mostly talked about protocol structure and meaning, but let’s finally turn to concrete collection types. Now with all that sacred knowledge about collection protocols, you should be able to create your custom type, but you probably won’t. The standard ones (
Set) are pretty exhaustive, well optimised, and just enough in 99.9 of cases. Let’s see how exactly they are inscribed into the complete collection protocols hierarchy.
Collection types family tree
As you can see now and as mentioned in the intro,
Array is connected with the
Collection protocol through the number of transitional forms. In this article, we’re not going to focus on all of them, but if you wish to go into it deeper, you may check out this great article by Ole Begemann. But here let’s highlight at least some interesting things.
As you can see from the diagram above, one of the conformances of
Array is the
MutableCollection protocol. Well, it doesn’t seem counterintuitive at all; of course
Array is mutable! But if your first association was methods like
remove(), this is not the case. The thing is, documentation of the
MutableCollection protocol postulates that changing values of its elements should not affect the length of
Collection. But obviously,
remove() do, so they don’t fall within this protocol. They’re declared and implemented in
Array itself. What about
MutableCollection? It is mostly about having write access for
subscript, in addition to the read-only access inherited from
That is also why
Set, being actually mutable, bypasses
MutableCollection and immediately conforms to
Collection. But what about
Dictionary? It does have a subscript that sets the value for the key, replacing the previous one and thus not affecting the size. What is wrong with it, then?
Well, here comes another subtlety, and it’s time for a more careful look at the
Index of the dictionary. This is a bit tangled at first sight, but
Dictionary has nothing to do with
Collection.Index, just like
Value is completely unrelated to
Sequence.Element. So if you declare, let’s say,
Dictionary<Int, String>, its
Key would be
String, and its
Element is the tuple
(key: Int, value: String).
And what about
Index? It is a separate struct, internally declared in the dictionary, in our case
Dictionary<Int, String>.Index. This struct basically holds all that hashing-related stuff that the internal implementation of
Dictionary relies on in order to be a hash table. By the way, something similar also applies to
Set.Index, which we also rarely use directly.
So, key and value are something specific to
Dictionary, and they belong to its implementation, not to that of the
Collection protocol. Hence, the widely used get-set
Dictionary is not in any way related to the read-only
subscript(Index) -> Element, required by the
Collection protocol. Of course, the latter is also still in play for
Dictionar, and you can well use it (as well as all other operations with
Index inherited from
Collection). But that’s not what we normally do, because
Value are here to make our lives easier. Here’s a short code snippet to summarise all this:
By the way, if you were ever wondering why the subscript of
Dictionary is optional while the apparently equivalent one of
Array is not, you’ve got your answer — they are not equivalent.
Sequence vs Collection — Some More Variations on a Theme
If you’re not yet overburdened with the above, let me share some more speculations on similarities and differences between the
Collection protocols. Hopefully you’ll have some practical takeaways even from the vague allegations in the section that follows.
Being different semantically, these protocols also provide operations of a somewhat different nature. This creates a certain division of roles between them among the APIs of each concrete collection type that conforms to both protocols. Feeling which protocol some method or property is more inherent to may be helpful when it comes to designing your own API that involves collections. Although in most real-life cases you are just okay with specifying a concrete type, sometimes you do have to be less restrictive and put
Collection, or even less restrictive and use
Sequence, depending on which behaviour or portion of their API is expected or necessary in this particular scope. After all, keeping your API as generic as possible is a good practice, and it would never harm if used wisely.
So the general rule of thumb, as you may have already guessed, would be that
Collection is primarily used when you have some knowledge of certain elements’ positions or need to gain this knowledge for future use, whereas
Sequence is just enough if you’re okay to flip through all of them from the beginning.
But here are some further reflections. Notice that the most typical operations over
Sequence cover the cases where you deal with the multitude of its elements as a whole, like converting it all with
map, digesting it all with
reduce, boiling it down with
filter, etc. Even when you search for an individual element of
Sequence (e.g., with
first(where:)), internally it still tests all of them one by one, so it generally still interacts with them as a list.
To put it mathematically (just in case you also like making simple things complicated), such operations are effectively the functions in which input is a list of elements and output is also a list of elements. It’s just a particular case that the result may happen to be an empty list, or that it is forcibly trimmed to be just one element. For example,
first(where:), the primary means of accessing a certain element in the sequence, is just a trimmed version of
filter(where:). It drops all but the first element of the results of filtering or returns
nil, if filtering resulted in an empty list. And, let’s say,
contains(where:) is just another way of saying whether the size of the list of elements retrieved is more than zero.
Collection is fundamentally aimed at giving an unambiguous and guaranteed result. The indexed subscript, the heart of the API of
Collection, provides one and only one element. If there is no such single element, it crashes. So it is not optional, unlike most of the semantically corresponding methods of
Sequence. This represents the mathematical idea of projection from a list of elements to a single and obligatory one. Again, this is not a strict rule or definition but more like trying to find some patterns and regularities.
Another distinction to be mentioned again is that only with
Collection do you first get the ability to know the number of its elements. It’s not possible with
Sequence because you never know beforehand when the iteration over it will reach the last element. For the same reason, be aware that
Sequence is conceptually infinite, so both traversing it and using most of its methods (that internally still involve traversing) are not quite safe. The
next() method of its iterator may keep returning a non-nil value till the end of time. This is not something you’re likely to run into when using sequences from a standard library, but when dealing with an unknown sequence, you can’t be 100% sure that you won’t get stuck inside of some method.
It looks like the only methods you can rely on are those dealing with a finite prefix (like
prefix(Int)). By the way, not knowing the size of a sequence also prevents you from accessing the last element and any logic that involves a suffix. (Well, frankly speaking, there are still
dropLast() methods in
Sequence, but they require deliberate restriction of the receiver to be finite and are somewhat dissonant here theoretically).
And the last thing to mention for today is performance. Unless documented otherwise,
Sequence must provide its elements no worse than in O(n), while
Collection must make it to the top-notch O(1), which is another win of having indexed access. However, the great law of performance-memory trade-off predicts that
Collection is more expensive in terms of memory. Of course, it all depends on concrete implementations, but as a concept — yes it is, and that comes from the requirement of repeated access. It’s kind of a loose analogy, but
Collection can be considered more state-oriented (indeed, dealing with index implies saving the state), while
Sequence is more in tune with a functional approach.
Whew, that’s all I wanted to tell you. If you followed this far, I should be really proud for keeping your attention for that long! Share your thoughts and questions in the comments. Thanks for reading, and see you next time