





This paper introduces a novel approach for the normalization of multimodal data, called RegBN, that incorporates regularization.
RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying
dependencies among different data sources. It enables effective normalization of both low and highlevel features in multimodal neural networks.

Given a trainable multimodal neural network (e.g., MLPs, CNNs, ViTs) with multimodality
backbones \(A\) and \(B\).
Let \(f^{(l)}\) represent the \(l\)th layer of network \(A\)
with batch size \(b\) and \(n_1\times...\times n_N\)
features that are flattened into a vector of size \(n\).
In a similar vein, we define \(g^{(k)}\) as the \(k\)th layer of
network \(B\) with \(m_1\times\ldots\times m_M\) features that are flattened into a vector of size \(m\).
RegBN make \(f^{(l)}\) and \(g^{(k)}\) mutually independent via
$$F(W^{(l,k)},\lambda_{+}) = {f^{(l)}W^{(l,k)} g^{(k)}}^2_2+\lambda_{+} ({W^{(l,k)}}_F1)$$
Where \(W^{(l,k)}\) is a projection matrix of size \(n\times m\) and
\({\lambda}_{+}\) is a Lagrangian multiplier.
\(W^{(l,k)}\) and \({\lambda}_{+}\) are estimated via an innovative recursivebased algorithm.
Where is it advisable to apply RegBN? 
In the paper, we reported the performance of RegBN over eight multimodal databases with various modalities, including multimedia, affective computing, robotics, healthcare diagnosis, etc. 
MNIST

Multimedia (MMIMDb)

@article{ghahremani2023regbn,
title={RegBN: Batch Normalization of Multimodal Data with Regularization},
author={Ghahremani, Morteza and Wachinger, Christian},
year={2023},
eprint={2310.00641},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgments 