Exploring Neural Granger Causality with xLSTMs: Unveiling Temporal Dependencies in Complex Data

NeurIPS 2025

1Carnegie Mellon University, 2TU Darmstadt (AI & ML Group), 3hessian.AI, 4DFKI, 5Centre for Cognitive Science, 6TU Eindhoven
*Equal Contribution

Abstract

Causality in time series can be challenging to determine, especially in the presence of non-linear dependencies. Granger causality helps analyze potential relationships between variables, thereby offering a method to determine whether one time series can predict—Granger cause—future values of another. Although successful, Granger causal methods still struggle with capturing long-range relations between variables.

To this end, we leverage the recently successful Extended Long Short-Term Memory (xLSTM) architecture and propose Granger causal xLSTMs (GC-xLSTM). It first enforces sparsity between the time series components by using a novel dynamic loss penalty on the initial projection. Specifically, we adaptively improve the model and identify sparsity candidates. Our joint optimization procedure then ensures that the Granger causal relations are recovered robustly. Our experimental evaluation on six diverse datasets demonstrates the overall efficacy of GC-xLSTM.

Motivation & Challenges

The core problem is finding which time series U are good predictors of future values of time series V. When using Neural Networks, we typically solve this as a forecasting problem. However, this approach introduces specific challenges:

  • Interpretability: There is a strong desire for sparsity in selected variates to make the models interpretable.
  • Weight Compression: We need to retain some high-value weights while compressing others to zero. Standard techniques like group lasso often fail to achieve this effectively!
  • Sensitivity: It remains difficult to accurately measure model sensitivity to specific inputs in a way that reflects true Granger Causality.

The GC-xLSTM Framework

Our method consists of three key steps to determine Granger causal links:

  1. Sparse Feature Selector: For each time series component, all variates are embedded with a selector W regularized through a novel sparsity loss.
  2. xLSTM Modeling: xLSTM models learn to autoregressively predict future steps from that embedding, leveraging exponential gating for long-range dependencies.
  3. Joint Optimization: A specialized procedure alternates between Gradient Descent (GD) and Proximal GD to compress weights and learn reduction coefficients, ensuring strict sparsity without auxiliary metrics.

Method & Results

GC-xLSTM Architecture

Architecture: The pipeline of sparse feature selectors and xLSTM models.

Optimization Procedure

Joint Optimization: Alternating between Gradient Descent and Proximal Gradient Descent to enforce sparsity.

Loss Curves

No Auxiliary Metrics Needed: Loss curves showing robust convergence without requiring additional validation metrics.

Molene Weather Results

Molène Weather Dataset: Uncovering dynamic GC weather patterns across stations in France.

Human Motion Capture

Human Motion Capture: Capturing complex dependencies in human motion (e.g., Salsa dancing).

BibTeX

@inproceedings{poonia2025exploring,
  title={Exploring Neural Granger Causality with xLSTMs: Unveiling Temporal Dependencies in Complex Data},
  author={Poonia, Harsh and Divo, Felix and Kersting, Kristian and Dhami, Devendra Singh},
  booktitle={Advances in Neural Information Processing Systems},
  year={2025}
}