RSS

Community and Industry Toolkit

CRISP-DM

Cross Industry Standard Process for Data Mining is the commonly practiced approach to tackling problems used by data mining experts. There are six major phases:

  • Business Understanding - Before anything can be accomplished, data mining experts must build an understanding of the project objectives and requirements from a business perspective. From here, the knowledge can be transferred into a data mining problem definition that is used to design a preliminary plan to achieve the objectives at hand.
  • Data Understanding - Once a preliminary plan has been determined, initial data is collected to become familiar with the data, test for data quality problems and to discover initial insights, which are used to form a hypotheses.
  • Data Preparation - This stage is often the longest and involves tedious work, taking the initial raw data and cleaning it in a way for modeling tools to understand.
  • Modelling -Once the data is ready, the appropriate modelling techniques are applied with their parameters calibrated to optimal values. This phase can often be revisited, as there are many different techniques that can be applied to the same data mining problem type.
  • Evaluation - At this point, the model has been built and appears to be working well. Before proceeding to final deployment, the model must be reviewed to make sure it properly achieves the business objectives.
  • Deployment - In order to gain any knowledge from the data, it must be organized and presented in a way that the end user can understand it. Depending on the requirements, this phase can be as simple as generating a report or as complex as implementing the data mining process itself.

tl_files/sites/aida/CRISP-DM.png