Linear maps are additive\[
J(e_i+\ldots+e_j) = J(e_i) +\ldots+ J(e_j)
\]
If the RHS summands are orthogonal and their structure is known, the sum can be decomposed
We can compute several columns/rows of the Jacobian in a single JVP/VJP!
Sparsity Pattern Detection
How it works
For \(x \in \mathbb{R}^n\), the gradient of \(f\) is defined as \[
\left(\nabla f(x)\right)_{i} = \frac{\partial f}{\partial x_i}
\]
sparsity patterns correspond to mask of non-zero values
can efficiently be represented by the set of indices corresponding to non-zero values: \[
\left\{i \;\big|\; \frac{\partial f}{\partial x_i} \neq 0\right\}
\]
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 23.313 μs … 583.592 μs ┊ GC (min … max): 0.00% … 87.72%
Time (median): 26.459 μs ┊ GC (median): 0.00%
Time (mean ± σ): 28.760 μs ± 27.892 μs ┊ GC (mean ± σ): 6.24% ± 6.04%
▂▇█▆▂
▂▃▄▅▅▆█████▇▆▆▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
23.3 μs Histogram: frequency by time 44.9 μs <
Memory estimate: 305.25 KiB, allocs estimate: 17.
Conclusion
Applications
(This is a meme, it’s a bit more nuanced.)
Outlook
“Chunked” mode with statically sized index sets
GPU support
References
Gebremedhin, Assefaw Hadish, Fredrik Manne, and Alex Pothen. 2005. “What Color Is Your Jacobian? Graph Coloring for Computing Derivatives.”SIAM Review 47 (4): 629–705. https://doi.org/cmwds4.
Griewank, Andreas, and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. 2nd ed. Philadelphia, PA: Society for Industrial and Applied Mathematics.
Walther, Andrea. 2008. “Computing Sparse Hessians with Automatic Differentiation.”ACM Transactions on Mathematical Software 34 (1): 3:1–15. https://doi.org/10.1145/1322436.1322439.
———. 2012. “On the Efficient Computation of Sparsity Patterns for Hessians.” In Recent Advances in Algorithmic Differentiation, edited by Shaun Forth, Paul Hovland, Eric Phipps, Jean Utke, and Andrea Walther, 139–49. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-30023-3_13.
The Julia AD ecosystem
Three types of AD users
Package users want to differentiate through functions
Package developers want to write differentiable functions
Backend developers want to create new AD systems
Python vs. Julia: user experience
Python vs. Julia: developers
Python vs. Julia: developers
Why so many backends?
Conflicting paradigms:
numeric vs. symbolic vs. algorithmic
operator overloading vs. source-to-source (which source?)