PARALLEL AND DISTRIBUTED COMPUTING II AND LAB. CPD
The course extends the methodologies, techniques and tools for developing algorithms and software in high performance computing environments by addressing the current issues of advanced computing tools for multi-core and GPU environments.
For laboratory activity, the course involves the use of C/C++ programming language and the CUDA environment of nVIDIA, for the development of parallel software that leverages the processing power of modern graphics processors.
Knowledge and understanding: the student must demonstrate knowledge of the fundamentals of parallel computing, the organization of the CUDA environment memory hierarchy, both hardware and software, and the parallelization strategies for some basic computational kernels of programming with and without the use of shared memory.
Ability to apply knowledge and understanding: the student must demonstrate how to use the strategies studied and the CUDA APIs to develop algorithms in a multicore/GPU environment by exploiting knowledge about parallelization issues related to high-performance and hybrid environment.
Autonomy of judgement: the student must be able to independently evaluate the results of a parallel algorithm by means of performance analysis in terms of gain, Gflops, and memory management.
Communication skills: the student should be able to illustrate a parallel algorithm and document its implementation in multicore/GPU environments.
Learning skills: the student must be able to update and deepen topics and specific applications of numerical computing, even accessing databases, on-line scientific software repositories and other tools available on the web.
The attendant student must have acquired knowledge and skills transmitted in the 1st level degree, including: Parallel and Distributed Computing and Image processing. Moreover, the student must have acquired knowledge and skills transmitted in the following course of 2nd level degree: Algorithms and Data Structures II and lab ASD II, Scientific computing Application and lab ACS.
High Performance Computiong: Definition and Motivation - Measurement of a computer's performance and software execution time - Supercomputer evolution history - Distributed and multicore architectures - Moore's law and subsequent refinements - Graphics processors (GPUs) and CUDA environment: GPU parallelism (motivations, advantages and disadvantages)- The General Purpose GPU - Multiprocessor Array - GPU Computing Units: Host and Device - The Kernel Concept - Grid and Blocks - The pitch and the concept of coalescence - The CUDA architecture - Memory organization: registers, local memory and the shared memory; The constant, the textured and the global memory - Parallel programming with CUDA and the C language - The main CUDA routines: allocation, de-allocation and data exchange host/device - Invocation and declaration of a kernel - Memory management API - Mapping process - 2D data management: Picth routines - Error handling - Timing - Allocation and management of data structures in the shared memory - Thread synchronization routines - Device management - The CUDAprof toolkit and its graphic versione CUDAprofiler - The CUBLAS library of CUDA for matrix and vector basic operations - Brief introduction to BLAS, PBLAS, PLASMA libraries - CUBLAS levels - Default routines for data transfer between host and device - Creation of the CUBLAS environment – Error Managing - Routines for basic operations.
A. Grama, G. Karypis, V. Kumar, A. Gupta: “Introduction to Parallel Computing (2nd Edition)”. Addison Wesley
J. Sanders, E. Kandrot. Foreword by J. Dongarra: “CUDA BY EXAMPLE: An introduction to General-Purpose GPU Programming”. NVIDIA
All lessons are available as slides (in pdf format) on the e-learning platform of the Department of Science and Technology, together with self-assessment exercises, libraries manuals, exams, recent papers on the most innovative parallel topics.
The goal of the verification procedure is to quantify, for each student, the degree of achievement of the learning objectives listed above. To be specific, the exam consists of the project development assigned by the teacher consisting in the implementation of a software to solve a real problem in the GPU environment (70% of the vote), an oral test to examine the capacity to analyze a parallel software In the GPU environment (30% of the vote).