Università degli Studi di Napoli "Parthenope"

Teaching schedule

Academic year: 
Belonging course: 
Disciplinary sector: 
Year of study: 
First Semester
Hours of front activity: 


Italian. If any foreigner student attends the class, then the course will be given in English.

Course description

The purpose of the course is to provide theoretical and practical foundations for the elaboration of multimodal data, such as audio and video through Machine Learning methods.

Knowledge and ability to understand
The student must show that he knows and know how to use the fundamentals of the most modern programming languages, such as objects oriented (OO). The student will have to be sufficiently familiar with the techniques of linear algebra, mathematical analysis, probability and statistics, given during the Bachelor's degree course, in order to understand the theoretical foundations of machine learning techniques presented Over. Finally, the student will have to have a good familiarity with the English language written in such a way as to use bibliographical resources, in the form of texts and scientific articles, almost all in English.

Application capacity
The student must demonstrate that he knows how to use his acquired knowledge to solve a problem of classification or prediction of multimodal data using Machine learning techniques. It also has to demonstrate development capabilities and complexity analysis of algorithms. The student must demonstrate a conscious use of currently available software libraries for the implementation of the main machine learning methods presented in the course.

Autonomy of Judgement
The student must be able to know how to independently evaluate the effectiveness and efficiency of a machine learning application to a real case. It must have also gained a sufficient independence and critical sense in order to be able to use bibliographical resources. This ability is crucial for the maturation of aptitudes of both self-learning, during the course of the studies, and of Lifelong learning, that the master student must be able to perform, after completing the Magistral studies.

Communication skills
The student must be able to write a report and organize a presentation, both preferably in English, on a multimodal machine learning application.

Ability to learn
The student must have matured, at the end of the course, the ability of Self-Learning, that is to be able, for the purpose of self-learning, to use, independently, bibliographical resources that search engines and open source repositories, such as Scholar Google and Researchgate, make available


Basic notions of object-oriented programming, linear Algebra, mathematical analysis, probability and statistics, and scientific computation.


1. Introduction to lmage and Video Processing (12 h)
1.1 Human Color Perception
1.2 Color Models: Hardware-Oriented Models
1.3 Color Models: Perceptually Uniform Color Models
1.4 Image Standards: JPEG and JPEG 2000
1.5 Wavelet Transform
1.6 MPEG Standards: MPEG-2 and MPEG-4

2. Audio Processing (12 h)
2.1 Human Sound Perception
2.2 Speech Production
2.3 Speech Feature Extraction
2.4 Speech Classifiers

3. Complements of Machine Learning (12 h)
3.1 Feature Selection Methods
3.2 Manifold Learning Methods
3.3 Kernel and Spectral Methods
3.4 Sequence Classification Methods: HMM and Viterbi Algorithm

4 MultiModal Recognition by Machine Learning (12 h)
4.1 Automatic Face Recognition
4.2 Video Segmentation and Keyframe Extraction
4.3 Gesture Recognition
4.4 Speech Recognition
4.5 Automatic Personality Perception

The course intends to provide the theoretical and practical foundations for the elaboration of multimodal data through Machine Learning techniques.

Teaching Methods

Teaching is carried out by frontal lessons, and by the students themselves on some subjects of their particular interest. On the reference e-learning platform of the course there is the possibility to take advantage of video lessons.


• T. Hastie, R. Tibshirani, J. Friedman, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction”, 2nd Edition, 2008, Springer, ISBN: 978-0387848570
• M. Gori, “Machine Learning: A Constraint-Based Approach”, 2017, Morgan Kauffman, ISBN: 978-0081006597
• F. Camastra, A. Vinciarelli, “Machine Learning for Audio, Image and Video Analysis: Theory and Applications”, 2nd Edition, 2016, Springer Verlag, ISBN: 978-1447168409
• J. Shawe-Taylor, N. Cristianini, “Kernel Methods for Pattern Analysis”, 2004, Cambridge University Press, ISBN: 978-0521813976

Learning assessment

The student must have shown the ability to understand machine learning techniques for analyzing multimedia data. To pass the exam the student will have to develop a project and perform an oral interview. The project, agreed with the teacher, will cover the application of machine learning techniques to a specific multimodal domain. The development of the project will be accompanied by the drafting of a report, preferably in the English language. In the oral exam the student will present and discuss the project carried out (50% of the marks) and must show that he has learned the methods of machine learning covered during the course (50% of the course).

More information