Performance

Data structures for sparsity pattern representations

The most efficient internal data structure for sparsity pattern representations depends on the number of inputs and the computational graph / sparsity of a given function.

Let's use a convolutional layer from Flux.jl as an example. By default, SCT uses BitSet for Jacobian sparsity detection, which is well suited for small to medium sized functions.

using SparseConnectivityTracer, Flux, BenchmarkTools

x = rand(28, 28, 3, 1)
layer = Conv((3, 3), 3 => 2)

detector_bitset = TracerSparsityDetector()
jacobian_sparsity(layer, x, detector_bitset)

1352×2352 SparseArrays.SparseMatrixCSC{Bool, Int64} with 36504 stored entries:
⎡⠙⢷⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢿⣦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠿⣦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠙⠻⣦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⢷⣦⡀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⢷⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⎥
⎢⢤⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠿⢦⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠿⢦⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠻⠦⎥
⎢⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠻⣦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠻⣶⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⢷⣄⡀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠈⠻⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⢷⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⢷⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⢿⣦⡀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⎦

@benchmark jacobian_sparsity(layer, x, detector_bitset);

BenchmarkTools.Trial: 584 samples with 1 evaluation per sample.
 Range (min … max):  5.473 ms … 34.348 ms  ┊ GC (min … max):  0.00% … 68.86%
 Time  (median):     6.512 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   8.527 ms ±  3.803 ms  ┊ GC (mean ± σ):  25.00% ± 24.44%

  ▆█▇▇▅▄▁                ▄▄▄▂▃▃▁                              
  ███████▇▁▆▁▄▅▄▄▄▇▅█▅▄▁▆████████▇▆▄▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▄ ▇
  5.47 ms      Histogram: log(frequency) by time     22.2 ms <

 Memory estimate: 20.26 MiB, allocs estimate: 255963.

Instead of BitSet, we can use any concrete subtype of AbstractSet{<:Integer}, for example Set{UInt}. To set the sparsity pattern type for Jacobian sparsity detection, we use the keyword argument gradient_pattern_type:

detector_set = TracerSparsityDetector(; gradient_pattern_type=Set{UInt})
@benchmark jacobian_sparsity(layer, x, detector_set);

BenchmarkTools.Trial: 220 samples with 1 evaluation per sample.
 Range (min … max):  18.510 ms … 50.831 ms  ┊ GC (min … max):  0.00% … 52.75%
 Time  (median):     19.907 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   22.818 ms ±  4.904 ms  ┊ GC (mean ± σ):  13.27% ± 14.02%

  ██▅▂          ▄▇▅▄▂                                          
  ████▅▁▁▁▁▁▁▁▅▆█████▁▆▁▅▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▆
  18.5 ms      Histogram: log(frequency) by time      46.1 ms <

 Memory estimate: 27.52 MiB, allocs estimate: 256937.

While this is slower for the given input size, the performance is highly dependant on the problem. For larger inputs (e.g. of size $224 \times 224 \times 3 \times 1$), detector_set will outperform detector_bitset. Note that memory requirement will vary as well.

For Hessians sparsity detection, the internal sparsity pattern representation uses either concrete subtypes of AbstractDict{I, AbstractSet{I}} or AbstractSet{Tuple{I, I}}, where I <: Integer. By default, Dict{Int, BitSet) is used. To set the sparsity pattern type, use the keyword argument hessian_pattern_type:

detector = TracerSparsityDetector(; hessian_pattern_type=Dict{UInt, Set{UInt}})

TracerSparsityDetector{SparseConnectivityTracer.GradientTracer{Int64, BitSet},SparseConnectivityTracer.HessianTracer{UInt64, Set{UInt64}, Dict{UInt64, Set{UInt64}}, SparseConnectivityTracer.NotShared}}()

Data structures can also be set analogously for TracerLocalSparsityDetector. If both Jacobian and Hessian sparsity patterns are needed, gradient_pattern_type and hessian_pattern_type can be set separately.