by Satish Shankar for
Summary:
Have you ever wondered how recommendation engines work? Or how you can predict house prices based on historical real-estate data? Learn how to use practical machine learning algorithms with python tools and libraries. By the end of the talk, you will be equipped to analyze large data sets, build customized predictive models, and most importantly, know your way around the world of modern machine learning algorithms. Expect code snippets, demos and some pretty pictures generated from real-world data-sets! Prior programming experience upto an intermediate level is assumed. Basic vector math/statistics will be useful, but not necessary.
Talk outline:
1) Introduction to machine learning: supervised, unsupervised and reinforcement learning
2) Feature extraction: examples from a few real world data sets, including text and image data.
3) Supervised learning methods: regression, naive bayes, support vector machines.
4) Unsupervised learning methods: clustering and dimensionality reduction
5) Designing a recommendation engine: collaborative filtering.
6) Debugging your machine learning algorithm: learning curves and the bias-variance tradeoff.
All concepts will be illustrated with real world data sets, drawn from the web. We will point to a rich array of python tools in the machine learning landscape, such as scikit-learn, NLTK, Gensim, matplotlib/numpy, and so on.
Goals of the talk:
1) Have an systematic way to think about machine learning.
2) Understand the major different classes of machine learning algorithms so that you know when to use what.
3) Know the most common pitfalls, limitations and bottlenecks while implementing a real world machine learning system.
Intended audience:
If you want to know what are machine learning algorithms, what can they do, and how to use them: this talk is for you. If you are interested in building data-driven applications such as a recommendation engine, an automatic news/spam classifier, a handwritten digits recognizer and so on, this talk is for you.
Prerequisites:
An intermediate-level programming background is assumed. Prior exposure to vector math and basic statistics is useful, but not necessary.