DATA MINING FOR BUSINESS
The course of Data Mining For Decision Making aims to introduce the student to the discipline of statistics as a science of understanding and analyzing data. The student will learn how to effectively make use of data in the face of uncertainty: how to collect data, how to analyze data, and how to use data to make inferences and conclusions about real world phenomena. In particular, the multidimensional statistical techniques are preferred. In order to provide the student with the skills to interpret the results of the analyses carried out, many concrete experiences are developed. Particular emphasis is given to the application of theories and methods, with the aim of forming the critical sense of the students in the choice of the most appropriate tools for the realization of the models.
- Knowledge and understanding skylls
The course of Data Mining For Decision Making aims to provide the conceptual and methodological foundations of the most important data mining techniques used in the processes of extracting knowledge from data, which can be spent within organizations and companies, with the objective of planning and performing data organization, processing and analysis activities in order to support company decisions. Knowledge and understanding are mainly acquired through the active participation of the student in frontal lessons and through individual study.
- Ability to apply knowledge and understanding
The knowledge is aimed at giving the student the necessary skills to be able to translate into practice the theoretical and methodological issues acquired, working concretely in the different fields of application of the statistics. The student will be able to:
- use statistical knowledge (models and techniques) to support the decision-making processes of companies in their various functional areas (production, marketing, management control, quality control, data processing and information systems, etc.),
- perform analysis of data, market research, context analysis;
- manage the modelling, analysis and interpretation of statistical information in observational studies.
- Judgment autonomy
The course provides an adequate knowledge of the techniques and methodologies and the practical and operational skills that guarantee independence of judgment in conducting analyses concerning the measurement and management of uncertainty, treatment and interpretation of data relating to company problems. The student develops his own independent judgment by participating in the discussions and interventions requested by the teacher during the lectures. Therefore, even working in working groups, he will be able to motivate his choices during the analysis phase, as well as to interpret the results obtained, in light of the cognitive problem faced.
- Communication skills
The course aims to provide students with the appropriate skills and tools necessary to present in a clear and rigorous way, using modern communication tools, their analyses and their conclusions, both to specialists and non-specialists of the subject, both in writing that oral also through the use of the main reporting software applications.
- Learning ability
The student acquires a scientific method of study and an approach to problems that allows him to deal autonomously and effectively all the problems that will arise in professional life. In particular, he must be able to identify autonomously the tools and methodologies suitable for the development and strengthening of his professional skills.
The students need to have basic knowledge in statistics. A genuine interest in data analysis is a plus!
The main topics discussed at the course are:
- overview of data mining;
- primary and secondary information sources;
- collecting, exploring and preparing the data;
- the data matrix;
- Simple and multiple linear regression;
- indices of distances;
- cluster analysis;
- classification and regression trees;
- principal component analysis;
- correspondence analysis (simple and multiple);
- multidimensional scaling.
The teaching program can be divided into 4 blocks of lessons:
Block I: (about 18 hours of lessons + 2 hours di esercitazione): overview of data mining; primary and secondary information sources; collecting, exploring and preparing the data; the data matrix; preliminary data processing: missing data and outlier data; Simple and multiple linear regression.
Blocco II (about 22 hours of lessons + 4 hours di esercitazione): distances; indices of distances and indices of dissimilarity; indices of similarity; cluster analysis: agglomerative hierarchical clustering and partitioning methods; classiﬁcation by decision tree: AID, CHIAD e CART.
Blocco III (about 22 hours of lessons + 4 hours di esercitazione): principal component analysis; correspondence analysis (simple and multiple); multidimensional scaling.
The teaching activity consists of 72 hours of lectures, during which exercises are also proposed on the covered topics. Students are also assigned additional exercises to perform at home, individually or in groups, which are then corrected and discussed during the lesson hours.
Tufféry Stéphane (2011). Data Mining and Statistics for Decision Making, Wiley (Chapters 1, 2, 3, 7, 9, 11.1-5, 11.7)
The assessment is based on an oral examination. The vote of the examination is expressed in scale from 0 to 30. To pass the exam (a vote not lower than 18/30), the student must demonstrate at least a basic knowledge of the techniques illustrated during the course. To achieve the highest score (ie 30/30 or cum laude), the student must demonstrate an excellent knowledge of all the course contents as well as the ability to apply them to problem solving. The exhibition capacity and the degree of mastery of the vocabulary of the discipline also contribute to determining the final grade.
Lectures are in Italian. The professor is fluent in English and is available to interact with students in English, also during the examination.