In a data science project, 57% to 60% accounts for data cleaning. This is done by coding. Hence, it is essential to have basic programming knowledge, if not expert level. R and Python are the two main programming languages with built-in packages used for data science projects. Plus, mathematical, statistical and problem-solving programming skills for predictive modelling and data analysis are vital. Big Data platforms – Hadoop, Spark – demand strong coding skills.

