AI: How do I learn machine learning? (for software engineer with CS background)

I didn't do a PhD on machine learning (was mostly focused on Signal Processing and Software Engineering) so I get this question a lot. The typical person that asks me this question is a software engineer with a computer science background, so I will address it from that perspective. If you are a Math major, for example, my answer might be less useful.

Take an online course

The first thing I tell someone who wants to get into machine learning is to take Andrew Ng's online course. I think Ng's course is very much to-the-point and very well organized, so it is a great introduction for someone wanting to get into ML. I am surprised when people tell me the course is "too basic" or "too superficial". If they tell me that I ask them to explain the difference between Logistic Regression and Linear Kernel SVMs, PCA vs. Matrix Factorization, regularization, or gradient descent. I have interviewed candidates who claimed years of ML experience that did not know the answer to these questions. They are all clearly explained in Ng's course. There are many other other online courses you can take after this one (see My answer to What is the best MOOC to get started in Machine Learning?) but at this point you are mostly ready to go to the next step.

Implement an algorithm

My recommended next step is the following. Get a good ML book (my list below), read the first intro chapters, and then jump to whatever chapter includes an algorithm you are interested. Once you have found that algo, dive into it, understand all the details, and, especially, implement it. In the previous online course you would already have implemented some algorithms in Octave. But, here I am talking about implementing an algorithm from scratch in a "real" programming language. You can still start with an easy one such as L2-regularized Logistic Regression, or k-means, but you should also push yourself to implement more interesting ones such as LDA (Latent Dirichlet Allocation) or SVMs. You can use a reference implementation in one of the many existing libraries to make sure you are getting comparable results, but ideally you don't want to look at the code but actually force yourself to implement it directly from the mathematical formulation in the book.

Some book recommendations

So, what are some good books to do this? Many have been mentioned before. Some of my favorite (see my answer to What are the best books about machine learning? for more details):

Kevin Murphy's Machine learning: a Probabilistic Perspective
Hastie, Tibshirani, and Friedman's The Elements of Statistical Learning
Bishop's Pattern Recognition and Machine Learning
David Barber's Bayesian Reasoning and Machine Learning
Larry Wasserman's All of Statistics: A Concise Course in Statistical Inference (more details on this book in my edit below)

You can also go directly to a research paper that introduces an algorithm or approach you are interested on and dive into it.

My main point is that machine learning is both about breadth as depth. You are expected to know the basics of the most important algorithms (see my answer to What are the top 10 data mining or machine learning algorithms?). On the other hand, you are also expected to understand low-level complicated details of algorithms and their implementation details. I think the approach I am describing addresses both these dimensions and I have seen it work.

Ready for a career in Machine Learning?

The next logic step some people ask about is whether they should now be ready to start a career in machine learning. That is, of course, a different question. Please refer to amy answer to How should one start a career in machine learning? for that.

Edit 08/26/2015

In response to some comments and questions, I feel that I should add another book recommendation. If you feel like you lack some background in Statistics, I would totally recommend:

Larry Wasserman's All of Statistics: A Concise Course in Statistical Inference

AI

Tuesday, August 8, 2017

How do I learn machine learning? (for software engineer with CS background)

No comments:

Post a Comment