Performance
Data structures for sparsity pattern representations
The most efficient internal data structure for sparsity pattern representations depends on the number of inputs and the computational graph / sparsity of a given function.
Let's use a convolutional layer from Flux.jl as an example. By default, SCT uses BitSet
for Jacobian sparsity detection, which is well suited for small to medium sized functions.
using SparseConnectivityTracer, Flux, BenchmarkTools
x = rand(28, 28, 3, 1)
layer = Conv((3, 3), 3 => 2)
detector_bitset = TracerSparsityDetector()
jacobian_sparsity(layer, x, detector_bitset)
1352×2352 SparseArrays.SparseMatrixCSC{Bool, Int64} with 36504 stored entries:
⎡⠙⢷⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢿⣦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠿⣦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠙⠻⣦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⢷⣦⡀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⢷⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⎥
⎢⢤⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠿⢦⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠿⢦⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠻⠦⎥
⎢⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠻⣦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠻⣶⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⢷⣄⡀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⢷⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⢷⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⎦
@benchmark jacobian_sparsity(layer, x, detector_bitset);
BenchmarkTools.Trial: 750 samples with 1 evaluation per sample.
Range (min … max): 4.643 ms … 20.766 ms ┊ GC (min … max): 0.00% … 57.78%
Time (median): 5.703 ms ┊ GC (median): 0.00%
Time (mean ± σ): 6.648 ms ± 1.815 ms ┊ GC (mean ± σ): 17.05% ± 18.20%
▂▆█▄▂▁▂
▃▂▂▃████████▇▆▃▃▁▂▁▂▂▁▁▂▁▁▁▁▁▁▂▁▂▃▂▄▅▅▇▇▆▆▆▅▅▄▄▃▃▂▂▂▂▁▁▁▁▃ ▃
4.64 ms Histogram: frequency by time 10.4 ms <
Memory estimate: 20.35 MiB, allocs estimate: 255954.
Instead of BitSet
, we can use any concrete subtype of AbstractSet{<:Integer}
, for example Set{UInt}
. To set the sparsity pattern type for Jacobian sparsity detection, we use the keyword argument gradient_pattern_type
:
detector_set = TracerSparsityDetector(; gradient_pattern_type=Set{UInt})
@benchmark jacobian_sparsity(layer, x, detector_set);
BenchmarkTools.Trial: 258 samples with 1 evaluation per sample.
Range (min … max): 17.048 ms … 45.700 ms ┊ GC (min … max): 0.00% … 47.52%
Time (median): 19.474 ms ┊ GC (median): 11.25%
Time (mean ± σ): 19.420 ms ± 3.307 ms ┊ GC (mean ± σ): 8.45% ± 7.88%
█▅▃ ▂▅▅▄▄▃▂
███▆▄▄▄▁▄▇███████▇▇▇▁▁▁▁▄▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▄▄▆▇▆▄▁▁▁▁▁▁▁▁▁▁▁▁▄ ▆
17 ms Histogram: log(frequency) by time 29.6 ms <
Memory estimate: 27.45 MiB, allocs estimate: 256928.
While this is slower for the given input size, the performance is highly dependant on the problem. For larger inputs (e.g. of size $224 \times 224 \times 3 \times 1$), detector_set
will outperform detector_bitset
. Note that memory requirement will vary as well.
For Hessians sparsity detection, the internal sparsity pattern representation uses either concrete subtypes of AbstractDict{I, AbstractSet{I}}
or AbstractSet{Tuple{I, I}}
, where I <: Integer
. By default, Dict{Int, BitSet)
is used. To set the sparsity pattern type, use the keyword argument hessian_pattern_type
:
detector = TracerSparsityDetector(; hessian_pattern_type=Dict{UInt, Set{UInt}})
TracerSparsityDetector{SparseConnectivityTracer.GradientTracer{Int64, BitSet},SparseConnectivityTracer.HessianTracer{UInt64, Set{UInt64}, Dict{UInt64, Set{UInt64}}, SparseConnectivityTracer.NotShared}}()
Data structures can also be set analogously for TracerLocalSparsityDetector
. If both Jacobian and Hessian sparsity patterns are needed, gradient_pattern_type
and hessian_pattern_type
can be set separately.