Machine Learning for Dummies
Arnaud Bailly
2016-09-05
All humans have equal intelligence;
Every human has received from God the faculty of being able to instruct himself;
We can teach what we don’t know;
Everything is in everything.
Joseph Jacotot (1770-1840)
as a silly coding challenge to apply for a job:
Extract the top 400 articles from Arxiv corresponding to the query
big data
, analyze their content using Google’s word2vec algorithm, then run a principal component analysis over the resulting words matrix and display the 100 most frequent words’ position on a 2D figure. In Haskell…
word2vec
AlgorithmMaximises probability of identifying context words for each word of the vocabulary \(W\)
\[ \frac{1}{T} \sum_{t=1}^{T} \sum_{-c\leq j \leq c, j\neq 0} \log p(w_{t+j}|w_t) \]
Define conditional probability \(p(w'|w)\) using softmax function:
\[ p(w_O|w_I) = \frac{\exp(v'_{w_O}^{\top} v_{w_I})}{\sum_{i=1}^{T} \exp(v'_{w_i}^{\top} v_{w_I})} \]
\[ W_{new}' = W' - \alpha G_O \]
\[ w_I_{new} = w_I - \alpha h' \]
Idea: Approximate probability over \(V\) with probabilities over binary encoding of \(V\)