Somewhat unusually, this blog post is future-looking: it mostly focuses on things that don't yet exist. Its purpose is to lay out the background for community discussion about possible changes to the core API for AbstractArray
s, and serves as background reading and reference material for a more focused "julep" (a julia enhancement proposal). Here, often I'll use the shorthand "array" to mean AbstractArray
, and use Array
if I explicitly mean julia's concrete Array
type.
As the reader is likely aware, in julia it's possible to write algorithms for which one or more inputs are only assumed to be AbstractArray
s. This is "generic" code, meaning it should work (i.e., produce a correct result) on any specific concrete array type. In an ideal world — which julia approaches rather well in many cases — generality of code should not have a negative impact on its performance: a generic implementation should be approximately as fast as one restricted to specific array type(s). This implies that generic algorithms should be written using lower-level operations that give good performance across a wide variety of array types.
Providing efficient low-level operations is a different kind of design challenge than one experiences with programming languages that "vectorize" everything. When successful, it promotes much greater reuse of code, because efficient, generic low-level parts allow you to write a wide variety of efficient, generic higher-level functions.
Naturally, as the diversity of array types grows, the more careful we have to be about our abstractions for these low-level operations.
In discussing general operations on arrays, it's useful to have a diverse collection of concrete arrays in mind.
In core julia, some types we support fairly well are:
Array
: the prototype for all arrays
Range
s: a good example of what I often consider a "computed" array, where essentially none of the values are stored in memory. Since there is no storage, these are immutable containers: you can't set values in individual slots.
BitArray
s: arrays that can only store 0 or 1 (false
or true
), and for which the internal storage is packed so that each entry requires only one bit.
SubArray
s: the problems this type introduced, and the resolution we adopted, probably serves as the best model for the generalizations considered here. Therefore, this case is discussed in greater detail below.
Another important class of array types in Base are sparse arrays: SparseMatrixCSC
and SparseVector
, as well as other sparse representations like Diagonal
, Bidiagonal
, and Tridiagonal
. These are good examples of array types where access patterns deserve careful thought. Notably, despite many commonalities in "strategy" among the 5 or so sparse parametrizations we have, implementations of core algorithms (e.g., matrix multiplication) are specialized for each sparse-like type — in other words, these mimic the "high level vectorized functions" strategy common to other languages. What we lack is a "sparse iteration API" that lets you write the main algorithms of sparse linear algebra efficiently in a generic way. Our current model is probably fine for SparseLike*Dense
operations, but gets to be harder to manage if you want to efficiently compute, e.g., Bidiagonal*SparseMatrixCSC
: the number of possible combinations you have to support grows rapidly with more sparse types, and thus represents a powerful incentive for developing efficient, generic low-level operations.
Outside of Base, there are some other mind-stretching examples of arrays, including:
DataFrames
: indexing arrays with symbols rather than integers. Other related types include NamedArrays
, AxisArrays
.
Interpolations
: indexing arrays with non-integer floating-point numbers
DistributedArrays
: another great example of a case in which you need to think through access patterns carefully
For arrays of fixed dimension, one can write algorithms that index arrays as A[i,j,k,...]
(good examples can be found in our linear algebra code, where everything is a vector or matrix). For algorithms that have to support arbitrary dimensionality, for a long time our fallback was linear indexing, A[i]
for integer i
. However, in general SubArrays cannot be efficiently accessed by a linear index because it results in call(s) to div
, and div
is slow. This is a CPU problem, not a Julia-specific problem. The slowness of div
is still true despite the recent addition of infrastructure to make it much faster — now one can make it merely "really bad" rather than "Terrible, Horrible, No Good, and Very Bad".
The way we (largely) resolved this problem was to make it possible to do cartesian indexing, A[i,j,k,...]
, for arrays of arbitrary dimensionality (the CartesianIndex
type). To leverage this in practical code, we also had to extend our iterators with the for I in eachindex(A)
construct. This allows one to select an iterator that optimizes the efficiency of access to elements of A
. In generic algorithms, the performance gains were not small, sometimes on the scale of ten- to fifty-fold. These types were described in a previous blog post.
To my knowledge, this approach has given Julia one of the most flexible yet efficient "array view" types in any programming language. Many languages base views on array strides, meaning situations in which the memory offset is regular along each dimension. Among other things, this requires that the underlying array is dense. In contrast, in Julia we can easily handle non-strided arrays (e.g., sampling at [1,3,17,428,...]
along one dimension, or creating a view of a SparseMatrixCSC
). We can also handle arrays for which there is no underlying storage (e.g., Range
s). Being able to do this with a common infrastructure is part of what makes different optimized array types useful in generic programming.
It's also worth pointing out some problems:
Most importantly, it requires that one adopt a slightly different programming style. Despite being well into another release cycle, this transition is still not complete, even in Base.
For algorithms that involve two or more arrays, there's a possibility that their "best" iterators will be of different types. In principle, this is a big problem. Consider matrix-vector multiplication, A[i,j]*v[j]
, where j
needs to be in-sync for both A
and v
, yet you'd also like all of these accesses to be maximally-efficient. In practice, right now this isn't a burning problem: even if our arrays don't all have efficient linear indexing, to my knowledge all of our (dense) array types have efficient cartesian indexing. Since indexing by N
integers (where N
is equal to the dimensionality of the array) is always performant, this serves as a reliable default for generic code. (It's worth noting that this isn't true for sparse arrays, and the lack of a corresponding generic solution is probably the main reason we lack a generic API for writing sparse algorithms.)
Unfortunately, I suspect that if we want to add support for certain new operations or types (specific examples below), it will force us to set the latter problem on fire.
Some possible new AbstractArray
types pose novel challenges.
These are the front-and-center motivation for this post. These are motivated by a desire to ensure that reshape(A, dims)
always returns a "view" of A
rather than allocating a copy of A
. (Much of the urgency of this julep goes away if we decide to abandon this goal, in which case for consistency we should always return a copy of A
.) It's worth noting that besides an explicit reshape
, we have some mechanisms for reshaping that currently cause a copy to be created, notably A[:]
or A[:, :]
applied to a 3D array.
Similar to SubArrays
, the main challenge for ReshapedArrays
is getting good performance. If A
is a 3D array, and you reshape it to a 2D array B
, then B[i,j]
must be expanded to A[k,l,m]
. The problem is that computing the correct k,l,m
might result in a call to div
. So ReshapedArrays violate a crutch of our current ecosystem, in that indexing with N
integers might not be the fastest way to access elements of B
. From a performance perspective, this problem is substantial (see #15449, about five- to ten-fold).
In simple cases, there's an easy way to circumvent this performance problem: define a new iterator type that (internally) iterates over the parent A
's indexes directly. In other words, create an iterator so that B[I]
immediately expands to A[I']
, and so that the latter has "ideal" performance.
Unfortunately, this strategy runs into a lot of trouble when you need to keep two arrays in sync: if you want to adopt this strategy, you simply can't write B[i,j]*v[j]
for matrix-vector multiplication anymore. A potential way around this problem is to define a new class of iterators that operate on specific dimensions of an array (#15459), writing B[ii,jj]*v[j]
. jj
(whatever that is) and j
need to be in-sync, but they don't necessarily need to both be integers. Using this kind of strategy, matrix-vector multiplication
for j = 1:size(B, 2)
vj = v[j]
for i = 1:size(B, 1)
dest[i] += B[i,j] * vj
end
end
might be written in a more performant manner like this:
for (jj, vj) in zip(eachindex(B, Dimension{2}), v)
for (i, ii) in zip(eachindex(dest), eachindex(B, (:, jj)))
dest[i] += B[ii,jj]*vj
end
end
It's not too hard to figure out what eachindex(B, Dimension{2})
and eachindex(B, (:, jj))
should do: ii
, for example, could be a CartesianInnerIndex
(a type that does not yet exist) that for a particular column of B
iterates from A[3,7,4]
to A[5,8,4]
, where the d
th index component wraps around at size(A, d)
. The big performance advantage of this strategy is that you only have to compute a div
to set the bounds of the iterator on each column; the inner loop doesn't require a div
on each element access. No doubt, given suitable definition of jj
one could be even more clever and avoid calculating div
altogether. To the author, this strategy seems promising as a way to resolve the majority of the performance concerns about ReshapedArrays — only if you needed "random access" would you require slow (integer-based) operations.
However, a big problem is that compared to the "naive" implementation, this is rather ugly.
Julia's Array
type stores its entries in column-major order, meaning that A[i,j]
and A[i+1,j]
are in adjacent memory locations. For certain applications — or for interfacing with certain external code bases — it might be convenient to support row-major arrays, where instead A[i,j]
and A[i,j+1]
are in adjacent memory locations. More fundamentally, this is partially related to one of the most commented-on issues in all of julia's development history, known as "taking transposes seriously" aka #4774. There have been at least two attempts at implementation, #6837 and the mb/transpose
branch, and for the latter a summary of benefits and challenges was posted.
One of the biggest challenges mentioned was the huge explosion of methods that one would need to support. Can generic code come to the rescue here? There are two related concerns. The first is linear indexing: oftentimes this is conflated with "storage order," i.e., given two linear indexes i
and j
for the same array, the offset in memory is proportional to i-j
. For row-major arrays, this notion is not viable, because otherwise a loop
function copy!(dest, src)
for i = 1:length(src)
dest[i] = src[i] # trouble if `i` means "memory offset"
end
dest
end
would end up taking a transpose if src
and dest
don't use the same storage order. Consequently, a linear index has to be defined in terms of the corresponding cartesian (full-dimensionality) index. This isn't much of a real problem, because it's one we know how to solve: use ind2sub
(which is slow) when you have to, but for efficiency make row major arrays belong to the category (LinearSlow
) of arrays that defaults to iteration with cartesian indexes. Doing so will ensure that if one uses generic constructs like eachindex(src)
rather than 1:length(src)
, then the loop above can be fast.
The far more challenging problem concerns cache-efficiency: it's much slower to access elements of an array in anything other than storage-order. Some reasonably fast ways to write matrix-vector multiplication are
for j = 1:size(B, 2)
vj = v[j]
for i = 1:size(B, 1)
dest[i] += B[i,j] * vj
end
end
for a column-major matrix B
, and
for i = 1:size(B, 1)
for j = 1:size(B, 2)
dest[i] += B[i,j] * v[j]
end
end
for a row-major matrix. (One can do even better than this by using a scalar temporary accumulator, but let's not worry about that here.) The key point to note is that the order of the loops has been switched.
One could generalize this by defining a RowMajorRange
iterator that's a lot like our CartesianRange
iterator, but traverses the array in row-major order. eachindex
claims to return an "efficient iterator," and without a doubt the RowMajorRange
is a (much) more efficient iterator than a CartesianRange
iterator for row-major arrays. So let's imagine that eachindex
does what it says, and returns a RowMajorRange
iterator. Using this strategy, the two algorithms above can be combined into a single generic implementation:
for I in eachindex(B)
dest[I[1]] += B[I]*v[I[2]]
end
Yay! Score one for efficient generic implementations.
But our triumph is short-lived. Let's return to the example of copy!
above, and realize that dest
and src
might be two different array types, and therefore might be most-efficiently indexed with different iterator types. We're tempted to write this as
function copy!(dest, src)
for (idest, isrc) in zip(eachindex(dest), eachindex(src))
dest[idest] = src[isrc]
end
dest
end
Up until we introduced our RowMajorRange
return-type for eachindex
, this implementation would have been fine. But we just broke it, because now this will incorrectly take a transpose in certain situations.
In other words, without careful design the goals of "maximally-efficient iteration" and "keeping accesses in-sync" are in conflict.
Julia's arrays are indexed starting at 1, whereas some other languages start numbering at 0. If you take comments on various blog posts at face value, there are vast armies of programmers out there eagerly poised to adopt julia, but who won't even try it because of this difference in indexing. Since recruiting those armies will lead to world domination, this is clearly a problem of the utmost urgency.
More seriously, there are algorithms which simplify if you can index outside of the range from 1:size(A,d)
. In my own lab's internal code, we've long been using a CenterIndexedArray
type, in which such arrays (all of which have odd sizes) are indexed over the range -n:n
and for which 0 refers to the "center" element. One package which generalizes this notion is OffsetArrays
. Unfortunately, in practice both of these array types produce segfaults (due to built-in assumptions about when @inbounds
is appropriate) for many of julia's core functions; over time my lab has had to write implementations specialized for CenterIndexedArrays
for quite a few julia functions.
OffsetArrays
illustrates another conceptual challenge, which can easily be demonstrated by copy!
. When dest
is a 1-dimensional OffsetArray
and src
is a standard Vector
, what should copy!
do? In particular, where does src[1]
go? Does it go in the first
element of dest
, or does it get stored in dest[1]
(which may not be the first element).
Such examples force us to think a little more deeply about what an array really is. There seem to be two potential conceptions. One is that arrays are lists, and multidimensional arrays are lists-of-lists-of-lists-of... In such a world view, the right thing to do is to put src[1]
into the first slot of dest
, because 1 is just a synonym for first
. However, this world view doesn't really endow any kind of "meaning" to the index-tuple of an array, and in that sense doesn't even include the distinction conveyed by an OffsetArray
. In other words, in this world an OffsetArray
is simply nonsensical, and shouldn't exist.
If instead one thinks OffsetArray
s should exist, this essentially forces one to adopt a different world view: arrays are effectively associative containers, where each index-tuple is the "key" by which one retrieves a value. With this mode of thinking, src[1]
should be stored in dest[1]
.
These examples suggest a formalization of AbstractArray
:
AbstractArray
s are specialized associative containers, in that the allowable "keys" may be restricted by more than just their julia type. Specifically, the allowable keys must be representable as a cartesian product of one-dimensional lists of values. The allowed keys may depend not just on the array type but also the specific array (e.g., its size). Attempted access by keys that cannot be converted to one of the allowed keys, for that specific array, result in BoundsError
s.
For any given array, one must be able to generate a finite-dimensional parametrization of the full domain of valid keys from the array itself. This might only require knowledge of the array size, or the keys might depend on some internal storage (think DataFrames
and OffsetArrays
). In some cases, just the array type might be sufficient (e.g., FixedSizeArrays
). By this definition, note that a Dict{ASCII5,Int}
, where ASCII5
is a type that means an ASCII string with 5 characters, would qualify as a 5-dimensional (sparse) array, but that a Dict{ASCIIString,Int}
would not (because there is no length limit to an ASCIIString
, and hence no finite dimensionality).
An array may be indexed by more than one key type (i.e., keys may have multiple parametrizations). Different key parametrizations are equivalent when they refer to the same element of a given array. Linear indexes and cartesian indexes are simple examples of interconvertable representations, but specialized iterators can produce other key types as well.
Arrays may support multiple iterators that produce non-equivalent key sequences. In other words, a row-major matrix may support both CartesianRange
and RowMajorRange
iterators that access elements in different orders.
Resolving these conflicting demands is not easy. One approach might be to decree that some of these array types simply can't be supported with generic code. It is possible that this is the right strategy. Alternatively, one can attept to devise an array API that handles all of these types (and hopefully more).
In GitHub issue #15648, we are discussing APIs that may resolve these challenges. Readers are encouraged to contribute to this discussion.