Office: EECS546 Gates
Email: jswang @ cs.nthu.edu.tw
Office Hours: Monday 10:00AM-Noon, Friday 12:30AM-Noon
Tuesday & Friday 9AM - 10:00AM in EECS 546
The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data.
Topics include: Frequent itemsets and Association rules, Near Neighbor Search in High Dimensional Data, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Systems, Clustering, Link Analysis, Large scale supervised machine learning, Data streams, Mining the Web for Structured Data, Web Advertising.
Tentative list of topics to be covered. These topics may change as the quarter progresses.
Knowledge of Java
Books: Leskovec-Rajaraman-Ullman: Mining of Massive Datasets can be downloaded for free. It can be purchased from Cambridge University Press, but you are not required to do so.
MOOC: There is a Coursera MOOC that is similar to this course. You may find it useful to view some of the videos there.
The coursework for the course will consist of: