Hacking into VowpalWabbit

There are many articles and tutorials describing how to use VowpalWabbit as a command line tool and for most use cases the algorithms implemented by default are just what you need. In this post however I’d like to share my experience diving into VW’s code. One reason you may want to do that is to reuse some of VW’s tricks – like feature hashing and the separation of parsing and training into two concurrent threads – in your own algorithm.

Read More

Proximal gradient methods

I’ve been playing a bit with stochastic gradient descent algorithms recently (inspired by Stéphane Gaïffas course)  and finally took the time to read about proximal methods. (Check this IPython notebook   if you’ want to skip the maths details – still a draft at the moment of this writing).

Proximal operators are particularly useful when a regularization term is added to the objective function: in addition to easing gradient descent on objectives that may not be smooth everywhere, they allow trying different types of regularisation almost without touching any code (which is why they are extensively used in modern libraries like MLlib) . More generally they have several interpretations in the context of convex optimisation [1] but I will not discuss them here.

The proximal operator of a function \(f\) is defined as:
$$
text{prox}_{lambda f}(v) = arg min_x {f(x) + frac{1}{2}|x-v|_2^2}
$$

Read More

Get rich flipping coins

or get broke trying…

A few days ago, I was talking with a friend about a simple coin flipping game that, he claimed, can make the organiser immensely rich. After he described it to me I was a bit sceptical.

The rules of the game are quite simple:

To enter the game you must pay 1€. You then choose to bet on ‘head’ or ‘tail’ and a coin is flipped. If you correctly predicted the outcome you get paid 1€. You can decide to keep it or to bet it again in hope of getting an additional 2€.  If you win you now have a total of 3€. Again you can choose to leave the game and keep the money, or you can try your luck and bet those 3€ to win an additional 4€. And so on… You can continue betting as long as you win and every time you double the amount of your additional gains. If you choose to continue and loose however, all your betting money is lost.

Read More

Notebooks

A small collection of IPython notebooks hosted on GitHub. The repo is quite messy but hopefully will improve with time!  Bayesian Logistic Regression.ipynb  First order optimization.ipynb  Latent Dirichlet Allocation.ipynb  Mixture of Gaussians.ipynb  Model sampling.ipynb

Read More
Search