According to Gartner analyst Nick Heudecker, over 85% of data science projects fail. A report from Dimensional Research indicated that only 4% of companies have succeeded in deploying ML models to production environment. For many companies, implementing data science into various aspects of their businesses can prove difficult if not daunting. Evidence suggests that the gap is widening between organizations successfully gaining value from data science and those struggling to do so.
One might be tempted to think it has to do with data and processing. You're not wrong. These are certainly challenges. But there are bigger problems. So let's explore some big data failure examples and dive into what drives these failures. Working as a technical manager at the interface between R&D and commercial operations has given me an insight into the traps that lie in our path; there are a number of factors that drive failure.
Not having the Right Talent
Finding, hiring, and retaining top tech talent is never easy. And the competition for qualified data talent is especially fierce. Data science/analytics skills are the the second most difficult skill set to find. For nearly two years, there has been a widespread talent shortage in the data science space. Popular research reported that there was a shortage of more than 150,000 individuals with data science skills. While the complex interdisciplinary approach of data science projects involves various subject matter experts such as mathematicians, data engineers, and many others, data scientists are often the most critical - and most difficult to recruit. This means companies are having a difficult time implementing and scaling their projects, which in turn, is slowing time to production. Additionally, many companies cannot afford the large teams required to run multiple projects simultaneously. For ETL, hire data engineers, for reporting hire BI analysts. Don't mix the roles.
Your Data is as good as your Data Governance
Without data governance, you don't have a data science project. End of story. Many companies lack data infrastructure or do not have enough volume or quality data. Data quality and data management issues are critical given the high reliance on good quality data by AI and ML projects. Yet, this data can be challenging to collect, create, or purchase. Having multiple (or zero) versions of the truth is one of the most common problems that organizations face. It is not that organizations don't have data, but rather that organizations don't properly marshal data into an environment where it can be analyzed and modeled. Due to a lack of data governance, data quality and integrity often inhibit analytics project success.