DATA MINING FOR BUSINESS
The course of Data Mining For Business Decision aims to provide the essential skills for the application and choice of the main multidimensional data analysis techniques: dependency analysis, classification of units and synthesis of information. With "Big Data" problems, data mining has become extremely important in many scientific fields, as well as in marketing, finance and other business disciplines. The course starts from the introductory concepts relating to the data organization and description, proposes the data processing through the basic statistical models and subsequently addresses the issue of statistical synthesis through the multivariate analysis methods in data mining contexts. In order to provide the student with the skills to interpret the results of the analyses carried out, many concrete experiences are developed.
- Knowledge and understanding skylls
The course of Data Mining For Business Decision aims to provide the conceptual and methodological foundations of the most important data mining techniques used in the processes of extracting knowledge from data, which can be spent within organizations and companies, with the objective of planning and performing data organization, processing and analysis activities in order to support company decisions. Knowledge and understanding are mainly acquired through the active participation of the student in frontal lessons and through individual study.
- Ability to apply knowledge and understanding
The knowledge is aimed at giving the student the necessary skills to be able to translate into practice the theoretical and methodological issues acquired, working concretely in the different fields of application of the statistics. The student will be able to:
- use statistical knowledge (models and techniques) to support the decision-making processes of companies in their various functional areas (production, marketing, management control, quality control, data processing and information systems, etc.),
- perform analysis of data, market research, context analysis;
- manage the modelling, analysis and interpretation of statistical information in observational studies.
- Judgment autonomy
The course provides an adequate knowledge of the techniques and methodologies and the practical and operational skills that guarantee independence of judgment in conducting analyses concerning the measurement and management of uncertainty, treatment and interpretation of data relating to company problems. The student develops his own independent judgment by participating in the discussions and interventions requested by the teacher during the lectures. Therefore, even working in working groups, he will be able to motivate his choices during the analysis phase, as well as to interpret the results obtained, in light of the cognitive problem faced.
- Communication skills
The course aims to provide students with the appropriate skills and tools necessary to present in a clear and rigorous way, using modern communication tools, their analyses and their conclusions, both to specialists and non-specialists of the subject, both in writing that oral also through the use of the main reporting software applications.
- Learning ability
The student acquires a scientific method of study and an approach to problems that allows him to deal autonomously and effectively all the problems that will arise in professional life. In particular, he must be able to identify autonomously the tools and methodologies suitable for the development and strengthening of his professional skills.
The students need to have basic knowledge in statistics. A genuine interest in data analysis is a plus!
Block I (about 18 hours of lessons + 2 hours of practice): data analysis and data mining; statistical information for businesses; primary and secondary information sources; Data organization: the data matrix; Data Warehouse and Data MART; Graphical tools for the visualization of multivariate data: Scatterplot matrix and Heatmap; Quantitative and qualitative data; Temporal and spatial data; Preliminary data processing: missing data and outlier data;
Block II (about 22 hours of lessons + 6 hours of practice): the relationships between variables: the simple and multiple linear regression model. Compact matrix representation. Estimation of the coefficients of the multiple linear regression model: ordinary least squares (OLS). Hypothesis for the construction of the model. Measures of the model goodness fit. Hypothesis testing on model parameters. Diagnostics and analysis of model residues. Further developments of regression analysis: interactions in multiple regression; polynomial regression, regression with instrumental variables; regression step and spline regression.
Block III (about 20 hours of lessons + 4 hours of practice): Distance measurements and similarity indexes: cluster analysis. Synthesis techniques: principal components analysis and correspondence analysis (simple and multiple).
The main topics discussed at the course are:
- Introduction to Data Mining
- Primary and secondary data
- Data matrix; Data Warehouse and Data MART; Scatterplot matrix and Heatmap
- Quantitative and qualitative data; temporal and spatial data
- Data cleaning: outliers and missing data
- Multiple linear regression model. Compact matrix representation. Hypothesis for the model construction
- Estimation of the coefficients (OLS)
- Measures of the goodness of fit.
- Further developments of the regression analysis.
- Distance measurements and similarity indices: clustering analysis
- Synthesis techniques: principal components analysis and correspondence analysis (simple and multiple).
The teaching activity consists of 72 hours of lectures, during which exercises are also proposed on the covered topics. Students are also assigned additional exercises to perform at home, individually or in groups, which are then corrected and discussed during the lesson hours.
Gareth J., Witten D., Hastie T., Tibshirani R. (2017), An Introduction to Statistical Learning (Cap. 3 – parr. 3.1-3.3; Cap. 7 – parr. 3.1-3.4)
Zani S. e Cerioli A. (2007). Analisi dei dati e Data mining per le decisioni aziendali, Giuffré Editore (Capp. 1, 3, 6-9)
Lecture notes by the teacher.
The assessment is based on an oral examination. The vote of the examination is expressed in scale from 0 to 30. To pass the exam (a vote not lower than 18/30), the student must demonstrate at least a basic knowledge of the techniques illustrated during the course. To achieve the highest score (ie 30/30 or cum laude), the student must demonstrate an excellent knowledge of all the course contents as well as the ability to apply them to problem solving. The exhibition capacity and the degree of mastery of the vocabulary of the discipline also contribute to determining the final grade.