Blind Separation of Instantaneous Mixtures of Nonstationary Sources

2013-10-27 23:45:02


Most source separation algorithms are based on a model of stationary sources. However, it is a simple matter to take advantage of possible nonstationarities of the sources to achieve separation. This paper develops novel approaches in this direction based on the principles of maximum likelihood and minimum mutual information. These principles are exploited by efficient algorithms in both the off-line case (via a new joint diagonalization procedure) and in the on-line case (via a Newton-like procedure). Some experiments showing the good performance of our algorithms and evidencing an interesting feature of our methods are presented: their ability to achieve a kind of super-efficiency. The paper concludes with a discussion contrasting separating methods for non-Gaussian and nonstationary models and emphasizing that, as a matter of fact, “what makes the algorithms work” is strictly speaking—not the nonstationarity itself but rather the property that each realization of the source signals has a time-varying envelope.


Theoretical Basis

In this report, we will mainly investigate two approaches, namely Maximum Likelihood and Block Gaussian Likelihood, to build objective functions for blind separation problems, and later we will discuss their connections with Guassian Mutual Information. If not specified, we assume the following sources are non-stationary Gaussian sources.


On-Line Algorithms

Simple Stochastic Gradient

For on-line separation, we could update separating matrix $B(t)$ by using the stochastic relative gradient for each iteration.
More specificly:
$$B(t+1) = B(t) - \lambda G(t)B(t)
where $\lambda$ is a small positive constant and $G(t)$ is the relative gradient defined by:
$$G(t) = \Sigma^{-2}S(t)S^{T}(t)-I
where $S(t) = B(t)X(t)$ and $\Sigma^2(t)=diag[\sigma_i^2(t),…,\sigma_K^2(t)]$, $\sigma_k^2(t)$ is an estimate of the variance.
For instance, we take $\sigma_k^2(t)$ as:
$$\sigma_k^2(t) = \sigma_k^2(t-1) + \rho[S_k^2(t)-\sigma_k^2(t-1)]
where $\rho$ is a small positive learning step.
So the simple on-line algorithm is the following:

Initial $X(t), B(t), \Sigma^2(t)$

Set $S(t)=B(t)X(t)$

Update $\Sigma^2(t)$

Calculate $G(t)$

Update $B(t)$


Newton-Like On-Line Technique

To accelerate the converging speed, we consider an exponentially weighted relative gradient matrix:
$$\bar{G}_t(B) = \sum_{\tau \leq t}\lambda(1-\lambda)^{t-\tau}\Sigma^{-2}(\tau)BX(\tau)X^T(\tau)B^T-I

The goal is to find $B(t)$ such that $\bar{G}_t(B)$ = 0. Thus we obtain the similar relative variation:
$$B(t+1) = B(t) - \lambda H(t)B(t)
where matrix $H(t)$ is the first-order expansion of the gradient.


\bar{G_t}(B-\lambda HB) &= \\
& \lambda[\Sigma^{-2}(t)BX(t)X(t)^TB^T-I] - \lambda H^T\\
&- \lambda \sum_{\tau \leq t}\lambda(1-\lambda)^{t-\tau}\Sigma^{-2}(\tau)BX(\tau)X^T(\tau)B^T\\
& + o(\lambda^2)

Neglecting the $o(\lambda^2)$ term, the condition $\bar{G}_t(B-\lambda HB) =0$ becomes:

H^T+\sum_{\tau \leq t}\lambda(1-\lambda)^{t-\tau}\Sigma^{-2}(\tau)H\Sigma^2(\tau)

In practice, we use an supplementary matrix $W$ for caculate matrix $H$:
w_{ij} & 1\\
1 & w_{ji}
$$w_{ij}(t) = w_{ij}(t-1) + \lambda [\sigma_j^2(t)/\sigma_i^2(t)-w_{ij}(t-1)]

Here comes the Newton-like on-line algorithm:

Initial $X(t), B(t), \Sigma^2(t)$

Set $S(t)=B(t)X(t)$

Update $\Sigma^2(t)$

Update matrix $W(t)$

Calculate $H(t)$

Update $B(t)$


Numerical Experiments

Experimental Data: Music and heart-beats signals

For our experiments, we use two kinds of data: 1) audio data with different sources (such as vocal, guitar, piano, etc.) and
2) a mixture data of gravida and baby heart-beats, as well as their breath sounds.

Music signals

We have 6 observations of different sensors of the audio signals, which is a mixture of 6 different single sources including
vocal, guitar, drums, etc. Each music segmentation is a 5-second wave file with a 44100 Hertz sample rate and 16 encoding bit.


In this project, we have implemented both on-line and off-line algorithms derived from maximum likelihood and block Gaussian likelihood
as well as mutual information. It is of interesting that Guassian mutual information could also
lead to the same criterion for joint diagnonalization, and further we found it is closed connected with Gaussian likelihood.
We conducted our experiments on two different kinds of data, i.e., music signals and heart-beats signals,
to demonstrate the effectiveness of the algorithms. The on-line algorithms are very sensitive to parameters. When the step ratio is too large, the matrix $B$ may not converge to a stable form, and when it is too small, the converging speed can be very slow. Further, we use a regularization method to avoid amplifying effect.
Overall the algorithms work well for the experimental data, though two sources in the 5-second music haven’t been completely separated. One explanation is that the two sources are very similar to each other.

Future we consider to extend the algorithms on images separation, speech recognition, etc. In those areas the principals of the algorithms remain similar and we could add extra information related to the inner structure of different data sources.