Explaining complex machine learning models is a fundamental challenge when developing safe and trustworthy deep learning applications. To date, a broad selection of explainable AI (XAI) algorithms exist. One popular choice is SmoothGrad, which has been conceived to alleviate the well-known shattered gradient problem by smoothing gradients through convolution. SmoothGrad proposes to solve this high-dimensional convolution integral by sampling — typically approximating the convolution with limited precision. Higher numbers of samples would amount to higher precision in approximating the convolution but also to higher computing demand, therefore in practice only few samples are used in SmoothGrad. In this work we propose a well founded novel method SmoothDiff to resolve this tradeoff yielding a speedup of over two orders of magnitude. Specifically, SmoothDiff leverages automatic differentiation to decompose the expected values of Jacobians across a network architecture, directly targeting only the non-linearities responsible for shattered gradients and making it easy to implement. We demonstrate SmoothDiff's excellent speed and performance in a number of experiments and benchmarks. Thus, SmoothDiff greatly enhances the usability (quality and speed) of SmoothGrad — a popular workhorse of XAI.
For scientific machine learning tasks with a lot of custom code, picking the right Automatic Differentiation (AD) system matters. Our Julia package DifferentiationInterface.jl provides a common frontend to a dozen AD backends, unlocking easy comparison and modular development. In particular, its built-in preparation mechanism leverages the strengths of each backend by amortizing one-time computations. This is key to enabling sophisticated features like sparsity handling without putting additional burdens on the user.
TMLR · 2025
Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation
From implicit differentiation to probabilistic modeling, Jacobian and Hessian matrices have many potential use cases in Machine Learning (ML), but they are viewed as computationally prohibitive. Fortunately, these matrices often exhibit sparsity, which can be leveraged to speed up the process of Automatic Differentiation (AD). This paper presents advances in sparsity detection, previously the performance bottleneck of Automatic Sparse Differentiation (ASD). Our implementation of sparsity detection is based on operator overloading, able to detect both local and global sparsity patterns, and supports flexible index set representations. It is fully automatic and requires no modification of user code, making it compatible with existing ML codebases. Most importantly, it is highly performant, unlocking Jacobians and Hessians at scales where they were considered too expensive to compute. On real-world problems from scientific ML, graph neural networks and optimization, we show significant speed-ups of up to three orders of magnitude. Notably, using our sparsity detection system, ASD outperforms standard AD for one-off computations, without amortization of either sparsity detection or matrix coloring.
ICLR Blog Post · 2025
An Illustrated Guide to Automatic Sparse Differentiation
In numerous applications of machine learning, Hessians and Jacobians exhibit sparsity, a property that can be leveraged to vastly accelerate their computation. While the usage of automatic differentiation in machine learning is ubiquitous, automatic sparse differentiation (ASD) remains largely unknown. This post introduces ASD, explaining its key components and their roles in the computation of both sparse Jacobians and Hessians. We conclude with a practical demonstration showcasing the performance benefits of ASD.
Prix science ouverte du logiciel libre de recherche:
Open source software award for DifferentiationInterface.jl (G. Dalle, A. Hill),
awarded by the French Ministry of Higher Education and Research
Jacobians and Hessians play vital roles in scientific computing and machine learning, from optimization to probabilistic modeling. While these matrices are often considered too computationally expensive to calculate, their inherent sparsity can be leveraged to dramatically accelerate Automatic Differentiation (AD). By building on top of DifferentiationInterface.jl, we are able to bring Automatic Sparse Differentiation to all major Julia AD backends, including ForwardDiff and Enzyme.
Despite AD's widespread adoption in the Julia ecosystem, Automatic Sparse Differentiation (ASD) remains an underutilized technique. We present a novel pipeline consisting of three open-source packages to bring ASD capabilities to all major Julia AD backends. The first half of the talk focuses on the unique challenges AD users face in Julia, introducing DifferentiationInterface.jl, a unified interface for over a dozen AD backends. The second half focuses on sparsity pattern detection, a key component of ASD. We present SparseConnectivityTracer.jl (SCT), a performant implementation of Jacobian and Hessian sparsity detection based on operator-overloading. SCT computes both local and global sparsity patterns, naturally avoids dead-ends in compute graphs and requires no code modifications. Notably, our ASD pipeline often outperforms standard AD for one-off computations, previously thought impractical in Julia due to slower sparsity detection methods.
Automatic Differentiation (AD) is the art of letting your computer work out derivatives for you. The Julia ecosystem provides many different AD tools, which can be overwhelming. This talk will give everyone the necessary information to answer two simple questions: As a developer, how do I make my functions differentiable? As a user, how do I differentiate through other people's functions?
ExplainableAI.jl, a comprehensive set of XAI tools for Julia, has undergone significant development since our initial presentation at JuliaCon 2022, and has since been expanded into the Julia-XAI ecosystem. This lightning talk will highlight the latest developments, including new methods, the new XAIBase.jl core interface, and new utilities for visualizing explanations of vision and language models.
JuliaCon, Remote · 2022
ExplainableAI.jl: Interpreting neural networks in Julia
In pursuit of interpreting black-box models such as deep image classifiers, a number of techniques have been developed that attribute and visualize the importance of input features with respect to the output of a model. ExplainableAI.jl brings several of these methods to Julia, building on top of primitives from the Flux ecosystem. In this talk, we will give an overview of current features and show how the package can easily be extended, allowing users to implement their own methods and rules.
Dithering algorithms are a group of color quantization techniques that create the illusion of continuous color in images with small color palettes by adding high-frequency noise or patterns. Traditionally used in printing, they are now mostly used for stylistic purposes. DitherPunk.jl implements a wide variety of fast and extensible dithering algorithms. Using its example, I will demonstrate how packages for creative coding can be built on top of the JuliaImages ecosystem.