.
Because we could: an immutable representation of our release timeline
What has changed in RC?
The changes in this update can be summarized into the following areas:
- Removal of
ImmutableArray<T>
. With a heavy heart we’ve decided to removeImmutableArray<T>
for now because we aren’t happy with the current design. - Comparers. We’ve simplified the exposed concepts by not exposing the comparers through the interfaces.
- Consistent construction. In the last update, we split the factory methods into methods that deal with scalars and ones that deal with other collections. We’ve improved this by renaming some APIs in order to apply the split consistently.
- Consistency with
ToString
. Previously, we didn’t overrideToString
consistently. We’ve fixed this by not overridingToString
at all. - Minor improvements. We’ve added a few convenience APIs.
I’ll discuss these areas in more detail.
ImmutableArray<T>
removed for now
After careful consideration we decided to remove
ImmutableArray<T>
from our 1.0 release plans. We aren’t happy with the design as it is today. This doesn’t mean we have given up on having an ImmutableArray<T>
– it just means we need more time to do it properly, and we don’t want to block releasing a stable version of immutable collections because we aren’t happy with a single type. After we ship the stable 1.0, we’ll ship a 1.1 beta that will include ImmutableArray<T>
.
Here’s why we weren’t happy with
ImmutableArray<T>
:
The primary goal for
ImmutableArray<T>
is to be a zero-overhead wrapper around an ordinary array. The current implementation, although it had little overhead, didn’t satisfy the performance needs of one of our most important partners, Project Roslyn, which served as an excellent benchmark for building a large-scale immutable system.
Because of performance requirements,
ImmutableArray<T>
needs to be a value type, i.e., a struct. In the CLR, all value types have an implied default state. This state is defined as having an underlying memory of all zeros. ForImmutableArray<T>
that means that the underlying array it wraps is null. In order to provide meaningful semantics, we believe it’s best for an immutable array in its default state to behave exactly like an empty array. That’s consistent with the way the .NET Framework deals with value types in general.
Unfortunately, we discovered that this design causes a general performance issue. The JIT compiler is normally able toeliminate the bounds checks for standard
for
loops. For ImmutableArray<T>
, the indexer and Length
property are inlined so one would think the bounds check would be eliminated as well. Unfortunately, that’s not the case, because these properties don’t just forward to the underlying array but also contain an additional null check to provide the behavior I explained. This causes the JIT to no longer be able to eliminate the bounds checks. Our partner Roslyn usesImmutableArray<T>
throughout its system, so this caused an unacceptable performance loss.
Today, the only way to get the performance we need is to remove the additional null checks from the indexer and
Length
property. However, we believe this has significant usability issues.
For example, consider the following code:
int[] ints = new int[] { 1, 2, 3 };
ImmutableArray<int> empty = new ImmutableArray<int>();
ImmutableArray<int> immutableInts = empty.AddRange(ints);
ImmutableArray<int> immutableInts = empty.AddRange(ints);
Since all value types have an automatic default constructor that initializes the value type to its default state, empty will be an
ImmutableArray<T>
whose underlying array is null. Thus, the implementation of AddRange
will crash with aNullReferenceException
.
There are other places where this problem manifests itself. Since
ImmutableArray<T>
implements certain collection interfaces (such as IEnumerable
and IReadOnlyList
), you can pass it to methods that work against the interface. Since the interface reference will be non-null, consumers won’t expect any methods or property invocations to cause aNullReferenceException
.
Not exactly a happy place. However, we do have some ideas on how to address this issue and decided to give it more baking time by removing
ImmutableArray<T>
for now.Reduced complexity around comparers
After reviewing some of your feedback, we decided that certain APIs that exposed comparers were overkill within the current design. Thus, we removed
IHashKeyCollection<TKey>
and ISortKeyCollection<TKey>
as well as theValueComparer
property from IImmutableDictionary<TKey,TValue>
.Consistent construction
When we introduced factory methods for constructing immutable collections, we also removed all
Empty
properties. So instead of:return ImmutableList<int>.Empty;
you have to write:
return ImmutableList.Create<int>();
We received feedback that this looks quite bad in code reviews, because most reviewers assume that
Create
would return a new instance. For added expressiveness, we decided to bring back the Empty
properties, so now you have both options.
In the past update, we introduced the
From
factory method so you could differentiate between factory methods that work on scalars (Create
) and and factory methods that operate on collections (From
). We realized that the name From
is an unfortunate choice because it doesn’t show up together in IntelliSense with Create
. In the RC update, we renamed From
toCreateRange
.
Also, our naming wasn’t fully consistent. There are some
Create
overloads on ImmutableDictionary
that accepted anIEnumerable
. We renamed those to CreateRange
as well.Consistency with ToString
ImmutableDictionary<TKey, TValue>
now no longer overrides ToString
(it was the only collection that did that anyway). This means that calling ToString
on any immutable collection will now return the fully qualified type (which is the behavior of Object.ToString
).
For debugging purposes, all collections already make use of DebuggerDisplayAttribute and DebuggerTypeProxyAttribute.
Minor improvements
The current signature of the
ToImmutableDictionary
method requires two selectors: one that determines the key and one that determines the value. This is inconsistent with Enumerable.ToDictionary, which provides an overload that allows you to specify only the key. The RC update includes the equivalent overload for ToImmutableDictionary
. Thus, code like this will now compile:
ImmutableList<Customer> customers = GetCustomers();
ImmtuableDictionary<int, Customer> customerById = customers.ToImmutableDictionary(c => c.Id);
ImmtuableDictionary<int, Customer> customerById = customers.ToImmutableDictionary(c => c.Id);
Performing a binary search is a fairly common operation for collections. Since
ImmutableList<T>
is using a tree behind the scenes, indexing is O(log n). Thus, providing a BinarySearch
method on ImmutableList<T>
can do a better job because it can traverse the internal tree and avoid having to repeatedly re-locate the node in the loop. For symmetry, we also added the BinarySearch
method to the ImmutableList<T>.Builder
.What has not changed?
We’ve received a lot of feedback on immutable collections. There were two major design points many of you commented on: the lack of constructors and the implementation of mutable interfaces. I’d like to explain these design decisions.
The lack of constructors
Some of you asked us to bring back the constructors. The concern was that this code:
ImmutableList<int> list = new ImmutableList<int>(new int[] { 1, 2, 3});
is, by far, more intuitive than this code:
ImmutableList<int> list = ImmutableList.Create<int>(new[] { 1, 2, 3});
That’s absolutely true. When designing APIs, we generally avoid factory methods because they are often an indication of overengineering and generally hurt usability. For example, most developers would agree that
XDocument
is easier to use than XmlDocument
. The fact that XDocument
avoids factories is one of the reasons why it’s more usable.
The challenge in designing immutable APIs is that it’s very important to avoid unnecessary allocations. For example, most immutable collections start off with an empty instance. Offering a constructor will inevitably cause significant GC pressure due to the fact that this is the most intuitive way to construct them.
Using factory methods also provide other interesting opportunities for optimization. For example, consider this code:
class NumberProcessor
{
private readonly ImmutableList<int> _numbers;
{
private readonly ImmutableList<int> _numbers;
NumberProcessor(IEnumerable<int> numbers)
{
_numbers = ImmutableList.CreateRange<int>(numbers);
}
{
_numbers = ImmutableList.CreateRange<int>(numbers);
}
// other methods elided for brevity
}
}
The goal of this code is to keep the numbers around. Since the input is typed as
IEnumerable<T>
, you can pass virtually any collection, including collections that are already immutable. In case numbers
is already an instance ofImmutableList<int>
, the CreateRange
factory method can simply cast it and thus avoid the construction of a new object. A constructor would have forced us to create a brand new immutable list.
There is also the problem of collection initializers. Suppose we provide a default constructor for
ImmutableList<int>
. In that case, the following code would compile just fine:
ImmutableList<int> items = new ImmutableList<int> {1, 2, 3};
Console.WriteLine(items.Count);
Console.WriteLine(items.Count);
You would probably expect the output to be
3
but in fact it’s 0
. The reason is that collection initializers follow a pattern. As long as the type has a constructor, implements IEnumerable
, and provides a public Add
method, the compiler lets you use the syntax above. The code that the compiler generates looks like this:
ImmutableList<int> items = new ImmutableList<int>();
items.Add(1);
items.Add(2);
items.Add(3);
Console.WriteLine(items.Count);
items.Add(1);
items.Add(2);
items.Add(3);
Console.WriteLine(items.Count);
Now it’s clear why the output is
0
: It’s because collection initializers don’t expect the Add
method to return a new instance and instead rely on side effects to mutate the existing instance. Since the result is never used, the items variable still refers to the original, empty collection.
Thus, we decided to exclusively offer factory methods to ensure that you get the best behavior and cannot accidentally do the wrong thing.
Implementing the mutable interfaces
Several people raised the issue that our immutable collection types implement the mutable interfaces. For example, this is what the declaration of
ImmutableList<T>
looks like:
public sealed class ImmutableList<T> :
IImmutableList<T>,
IEnumerable,
IEnumerable<T>,
IReadOnlyCollection<T>,
IReadOnlyList<T>,
ICollection,
ICollection<T>,
IList,
IList<T>
{
...
}
IImmutableList<T>,
IEnumerable,
IEnumerable<T>,
IReadOnlyCollection<T>,
IReadOnlyList<T>,
ICollection,
ICollection<T>,
IList,
IList<T>
{
...
}
It’s worth pointing out that this declaration does not have a correctness issue – you cannot mutate an instance of
ImmutableList<T>
by casting it to a mutable interface. For example, calling IList<T>.Add
would throwNotSupportedException
.
The mutable interfaces always supported a read-only mode, which means that consumers can’t change the contents. This mode is exposed via ICollection.IsReadOnly, which all mutable interfaces inherit from. Thus, the concrete immutable collection types all implement the existing, mutable interfaces in a read-only fashion, which means that
IsReadOnly
returns true and all methods that would mutate the collection throw NotSupportedException
.
That’s not fundamentally incompatible with immutable collections; for all intents and purposes, immutable collections provide an even stronger guarantee that nobody, not just the consumer, can change their contents. Some of you suggested that we shouldn’t implement the mutable interfaces at all, so you shouldn’t be able to accidentally pass an immutable collection to a method that takes an
IList<T>
. This would compile just fine, but might cause aNotSupportedException
at runtime if the method in question tries to modify the collection through the mutable interface.
Based on the data we have, we found that in many cases where methods take a mutable interface (such as
IList<T>
), the consumer only needs to read the contents. We believe that enabling interoperability with mutable collections is very important, because otherwise it would be difficult to adopt immutable collections into existing code.
Also, this interoperability is consistent with how the rest of the .NET Framework is designed. For example, Streams can support reading, writing, and seeking. Instead of providing separate types, we’ve decided to expose the capabilities via theoptional features pattern because it is typically much simpler and doesn’t result in an explosion of combinatorial types. The
IsReadOnly
property on ICollection<T>
is another instance of this pattern.
Thus, we decided that it’s best to implement the mutable interfaces on the concrete types.
Summary
Today we shipped the release candidate of our Microsoft.Bcl.Immutable NuGet package. We plan to release a stable 1.0 package by the end of September. We’ve removed
ImmutableArray<T>
from 1.0 because we weren’t happy with its design. It will re-appear in 1.1, which will be available as a beta release shortly after we ship a stable 1.0 package.
Please play around with the RC and let us know what you think
!
!
No comments:
Post a Comment