"The next wave of big data investment will target more of the enterprise mainstream that will have more modest IT and data science skills compared with the early adopters," says Tony Baer, principal analyst and author of the report, 2016 Trends to Watch: Big Data.
• A rising tide of IT spending will continue to lift investment in big data analytics.
• Appliance and cloud will drive the next wave of Hadoop adoption to mainstream enterprises lacking the same depth of IT skills as early adopters.
• SQL still reigns supreme for big data analytics, but Spark will comprise the fastest growing set of workloads.
• Machine learning becomes a checklist item for data wrangling and predictive analytics.
• Early enterprise adopters of Hadoop begin planning for data lake implementation.
The research points to the continuing strength of SQL as a key on-ramp for organizations to make sense of big data, as reflected by the growing differentiation by Hadoop providers for their own unique SQL-on-Hadoop technologies. “Don’t count SQL out,” said Baer. “SQL-on-Hadoop remains a potent draw for Hadoop vendors who are aiming to reach the large base of enterprise SQL developers out there.”
But even with the continued draw of SQL, Ovum views Spark – and especially machine learning and streaming – as the fastest-growing set of big data analytic workloads in 2016. “Spark will be complementary to SQL by providing additional paths to insights, such as through the streaming of graph analysis, which can then be queried using language that enterprise database developers are very familiar with,” said Baer.
While Spark is making fast headway with Java, Python, and R programmers, Ovum expects most of the benefit to come from embedding into analytics tools. Ovum predicts that the third-party analytic tools’ ecosystems that embed Spark computing will grow significantly in 2016 and that machine learning (a capability enabled by Spark) will become a checklist item for data preparation and predictive analytics.
Ovum’s other big prediction for 2016 is for data lake adoption to become a “front-burner issue” for mature Hadoop adopters that have already successfully put analytics into production serving multiple lines of business and stakeholder groups across the organization. The result will be a new demand for tools to govern the data lake and make it more transparent. Ovum expects significant growth in tooling that builds on emerging data lineage capabilities to catalog, protect, govern access, tier storage, and manage the lifecycle of data stored in data lakes.
“Governance of data lakes will not be built in a day. While some of the tooling exists today, capabilities such as managing the lifecycle of multi-tiered storage will have to be extended to cover the growing heterogeneity of Hadoop clusters,” concludes Baer.
Timely, incisive articles delivered directly to your inbox.