2015-10-14

#工匠的玩具

### 如何让机器理解我们的语言（一）语言与计算语言学

2015-09-04

#当你在谈话的时候，你在谈些什么？

2015-01-28

## 简介

word2vec 是 Google 推出的用来做词表示的开源工具包。

### LDA 和 PLSA

2014-07-03

LDA 是一种应用广泛的主题模型，物理意义直接，数学形式上也很优美，并且用的是贝叶斯学派的框架。

## PLSA

$$p(w|d) = \prod_{n=1}^N \sum_{k=1}^K p(w_n|z_k) p(z_k|d)$$

2014-06-29

2014-03-27

# Abstract

ARMA Process is a widely used model of time series, which has a lot of good properties. We will show in this article that, by testing the residual of a sequence, we can test whether or not ARMA model can be applied.
The most important part is to build a test statistic of white with Ljung-Box method.

# Background

Consider a real valued time series $(Z_{k})_{k\in \mathbb{Z}}$, we want to build a statistical test of the hypothesis
$$H_{0}={(Z_{k})_{k\in \mathbb{Z}} \mbox{is a white noise}}$$
against
$$H_{1}={(Z_{k})_{k\in \mathbb{Z}} \mbox{is not a white noise}}$$

Let $\hat{\mu}_n$ be the empirical mean, $\hat{\gamma}_n$ be the empirical autocovariance function, $\hat{\rho}_n$ be the empirical autocorrelation function:

$$\hat{\gamma}=n^{-1}\sigma_{1\leq s,s+t\leq n}(Z_s-\hat{\mu}_n)(Z_{s+t}-\hat{\mu}_n)$$
$$\hat{\rho}_n=\frac{\hat{\gamma}_n}{\gamma}$$

The Ljung-Box statistical test at lag $h>1$ is then defined as

$$T_n(h)=n(n+2)\sum_{t=1}^{h}\frac{\hat{(\rho}_n(t))^2}{n-t}$$

We now do this step by step.

2013-10-27

# Abstract

Most source separation algorithms are based on a model of stationary sources. However, it is a simple matter to take advantage of possible nonstationarities of the sources to achieve separation. This paper develops novel approaches in this direction based on the principles of maximum likelihood and minimum mutual information. These principles are exploited by efficient algorithms in both the off-line case (via a new joint diagonalization procedure) and in the on-line case (via a Newton-like procedure). Some experiments showing the good performance of our algorithms and evidencing an interesting feature of our methods are presented: their ability to achieve a kind of super-efficiency. The paper concludes with a discussion contrasting separating methods for non-Gaussian and nonstationary models and emphasizing that, as a matter of fact, “what makes the algorithms work” is strictly speaking—not the nonstationarity itself but rather the property that each realization of the source signals has a time-varying envelope.

# Introduction

## Theoretical Basis

In this report, we will mainly investigate two approaches, namely Maximum Likelihood and Block Gaussian Likelihood, to build objective functions for blind separation problems, and later we will discuss their connections with Guassian Mutual Information. If not specified, we assume the following sources are non-stationary Gaussian sources.

### Bayesian methods in ranking

2013-06-25

Many websites use a ranking system to determine the display order of content, especially for Q&A or news websites like Quora, Reddit, zhihu, that for a post/question, there might be multiple comments/answers, and the users could use the upvote or downvote buton to gradually change the displaying order.

### How to choose a good chart

2013-04-11

Visualization is a very import part of data science. Andrew Abela posted a way for how to choose a good chart to help visualize data.

### Grow a search result

2012-11-27

“Grow A Search Result” is an organic kind of search that presents results that grow over time, drawing attention to the things about which you are most passionate.

When we think of the experiences that search engines are designed to support, criteria such as speed and efficiency instantly come to mind. However, one of our main interests is in how web use is intertwined with daily life, and understanding the activities in which search engines play a role. In addition to fast, relevant search, our search engine focus on: