Università degli Studi di Napoli "Parthenope"

Teaching schedule

Academic year: 
2018/2019
Belonging course: 
Course of Master's Degree Programme on APPLIED COMPUTER SCIENCE (MACHINE LEARNING AND BIG DATA)
Disciplinary sector: 
INFORMATICS (INF/01)
Language: 
Italian
Credits: 
12
Year of study: 
1
Teachers: 
PETROSINO Alfredo
NARDUCCI Fabio
Cycle: 
First Semester
Hours of front activity: 
96

Language

Lectures in Italian (and in English on e-learning), books in English

Course description

The aim is to provide basics of probability theory that drives through resolution of classification/clustering problems in pattern recognition.
Theoretical problem will be discussed, and student will learn how to implement algorithms for classification and machine learning on bigdata.

Knowledge and understanding: the student must demonstrate knowledge and understanding of the theoretical and practical fundamentals of
problems of pattern recognition for classification and clustering, with focus on automatic learning and feature extraction from data.

Ability to apply knowledge and understanding: the student must demonstrate the ability to use their acquired knowledge to analyse complex and big datasets with the goal of being able to implement and customise the most appropriate and recent techniques and algorithms for knowledge extraction.

Autonomy of judgement: the student should be able to assess independently the challenges of heterogeneous and big datasets and being able to detect and analyse strengths and limitations of known machine learning algorithms to solve classification/clustering problems.

Communication skills: the student should be able to discuss with scientific rigor and adequate terminology the complex theoretical topics of machine learning techniques as well as being able to evaluate with criticism the experimental results on real case studies.

Learning skills: students must be able to update and autonomously deepen the recent and emerging proposals in machine learning algorithms and deep learning in the scientific literature. He/She will be also able to evaluate the lectures of the course and set up a personal perspective of the state-of-the-art.

Prerequisites

A good knowledge of Image Processing is required to understand the problems of classification and clustering on images.
A good knowledge of probability theory and mathematical analysis is strongly required to understand techniques and algorithm discussed during the course.

Syllabus

The extended program of the course is organised in the following lessons:
• Bayesian theory of decision: probability theory, prior/a-posteriori probability, maximum a posteriori probability MAP. (12 hours)
• Discriminant Functions, univariate/multivariate Gaussian density, Normal density, linear transformations (8 hours)
• Non-Parametric methods: non-parametric density estimation, histogram method, K-nearest neighbours (NKK) and Parzen Window, examples and exercises. (8 hours)
• Unsupervised learning and Clustering: definition of identifiability and mixture of densities, maximum likelihood estimation (MLE), mixture of Gaussians (MOG), definition of cluster, squared error partitioning, clustering, graph-theoretic clustering, examples and exercises. hierarchical clustering: dendrogram, single linking clustering and fulllinking (12 hours)
• Principal Components Analysis (PCA): dimensionality reduction techniques, geometrical representation of PCA, algebraic definition of PCA, examples of use of PCA, practical examples and exercises. (8 hours)
• Support Vector Machine (SVM): non-linear classification, Mathematical definition of SVM, kernel-trick, examples and exercise on SVM. (8 hours)
• Expectation-Maximization Algorithm (EM): definition and discussion of EM algorithm, the practical use of EM algorithm for MOG, examples. Probabilitstic Graphical Models (12 hours)
• Supervised and Unsupervised learning: Perceptron, Multilayer Perceptron network (MLP), Self-Organising Maps (SOM), examples and exercises (12 hours)
• Deep Learning: Restricted Boltzman Machine and Deep Belief Network (DBN), Convolutional neural network (CNN), examples and exercises. (16 hours)

Teaching provides the fundamentals of machine learning techniques and pattern recognition starting from the Bayesian theory of the decision, going through parametric and non-parametric methods, supervised/unsupervised learning, classification/clustering, up to neural networks and deep learning.

Teaching Methods

Teaching consists in a total of 96-hours lectures (12 CFU) of frontal lessons, divided in a module for theoreritical presentation of the algorithms
and solutions for Bayesian Models and Machine Learning (6CFU) and second module for Deep Learning for 6CFU. The teaching takes place in the classroom or laboratory and it is organised in 3-hours lessons according to the academic calendar. The frontal lessons consist in theoretical and practical lectures as well as exercise sessions given from the teacher according to the topics of course. Theoretical lessons are meant to transfer the student the knowledge of fundamentals of the decision theory and parametric/nonparametric methods for the estimation of distribution of probability of big datasets. They also consider the challenges and open issues of classification/clustering problems on data as well as techniques of feature
extraction and data mining for knowledge extraction. Evolved and emerging techniques for automatic learning on data conclude the course,
with an accurate discussion of Hebbian theory of learning and MLP neural networks up to deep learning with feed-forward neural network and
backpropagation, convolutional neural networks (CNN), Boltzman Machine and Deep Belief network (DBN). Laboratory lessons are meant to
retrace the theoretical topics to provide feasible implementation of algorithms and analyse strengths and limitations on different datasets
available online. During the laboratory part, exercises lessons are provided to the students. They are collegial in nature, take place in the
laboratory and are given by the teacher who proposes solutions to practical exercises meant to verify the adoption and implementation of
the theoretical topics presented in the previous lessons. The resolution of such exercises allows the students to verify his/her understanding of
the theoretical concepts and his/her ability to propose alternative implementations. The attendance is strongly encouraged, although it is optional. The exam is the same for all student, no matter of the rate of the attendance of the lessons.

Textbooks

Pattern Classification, 2nd Edition, Richard O. Duda, Peter E. Hart, David G. Stork, ISBN: 978-0-471-05669-0, Wiley-Interscience, 2000

Learning assessment

Project and Final oral exam.
The exam consists in a project assigned (the student has to ask a project to the teacher by email; he/she will receive the project by one week, if there any problem happens) to each student individually. The student must demonstrate of being able to read and understand a scientific article from the literature on one of the topics of the course. The student, by preparing a report, must be able to describe and discuss with criticism and scientific rigor the assigned article. Moreover, he/she must provide a feasible and reliable implementation of the algorithm and replicate the experiments discussed in the paper as well as to try to widen the analysis of the results with his/her own experimentation. At this stage, the student can freely arrange his work and schedule; no deadlines are provided to the delivery of the project, but it is required that the project be discussed at least one tome within 3 months since the assignement.
The student accesses to the second and final part of the exam, the oral test, just when he/she believes that the project is ready to be discussed.
Thus, the student book a reservation for the incoming exam session. The oral test is organised in two separate parts. In the first part the student
presents and discuss the project (possibly with the support of few slides) to the teacher. On completion of such a first stage, the oral test continues
with some questions about the theoretical topics of the course. The teacher will evaluate the maturity of the student and his/her ability of
providing answers with appropriate terms and technical/scientific rigor.
The presentation of the project covers 70% of final score while the remaining 30% is represented by the oral part, which consists in 3 questions on theoretical topics of the course. The final score will consider of both the accuracy of the presentation of the project and the quality of the oral test. The maximum score assigned is 30 and the minimum to pass the exam is 18.

More information

The learning material is provided, in English, through the e-learning platform at the following weblink http://escienzeetecnologie.uniparthenope.it. To access the published material, the student must be regularly registered at the platform and subscribed at the course.