In the first year of the “20s”, in 2020, each technical field also has its own review of a key event or a vision of a key node. Looking back, many significant and landmark events have occurred in the field of data technology. The global data volume will reach 41ZB in 2019, and estimated to exceeded 50ZB in 2020. This is the statistics and forecast of the international authoritative organization Statista. This amount of data can be said to be astonishingly large, and it also places higher requirements on data technology.
The west led the world for almost 2 centuries, which could be credited to their achievements in harnessing the leading technological accomplishments in all industrial revolutions, first the British commonwealth with steam engine, then the United States with combustion engine and then the internet. Now, it doesn’t take a genius to see that China is intended to end the “century humiliation” by taking the next high ground, which is very likely to be artificial intelligence, which could boost productivity while solving the evermore devastating population aging they are facing right now. Big-data in the center of all these, cannot be overlooked.
Standing in the new decade, it is important to review the past and look forward to the future in each field. Focusing on the data life cycle, they have carried out future data technology from data collection, construction, management, calculation to application. Trend outlook.
Data collection
Mini programs and IoT collection, collection-side computing, and collection laws and regulations will be breakthrough changes
Collection-side computing: In the 5G and IoT era, traffic data will continue to explode. In the future, how to ensure the normal operation of data collection under the condition of limited server & computing resources will be a core issue. One of the exploration directions will be end-to-end computing. By deploying algorithm models, data compression, data filtering, anti-cheating, etc. on the terminal, the pressure on the network, server end, and computing cluster will be significantly reduced.
The establishment of laws and regulations for big data collection: It may be necessary to start from three aspects: 1. Clearly stipulate the data that can and cannot be collected; 2. Clearly specify the collection technologies that cannot be used, and the quantitative indicators for legal use of each collection technology; 3. Scope and standards for investigation of illegal acts.
Data construction and management
The model-based development model will become the mainstream, and the integration of streaming and batching will rise from the engine layer to the platform layer, and the granularity of data processing will be more refined.
The model-based development model will become the mainstream: the threshold for big data development is further reduced, and users no longer need to write complex SQL codes, but only need to focus on the development of data models.
Stream batch integration will rise from the engine layer to the platform layer: stream batch integration is not limited to the engine layer, but has actual business scenario support at the platform layer.
The granularity of data processing will be more refined: data processing has been upgraded from table granularity to field granularity, which greatly reduces the cost of calculation and storage.
Data mining
The democratization of AI, multi-modal data, interpretable AI and enhanced analysis, 5G, IOT, and edge computing will be breakthrough changes.
AI democratization: With the development of AutoML technology, every process of data mining is evolving toward automation. More and more positions can use AI capabilities, and the problem of insufficient AI talents will be solved within 5 years ease.
Multi-modal data: In the next 10 years, the unified mining and modeling of multi-modal data, as well as mutual translation and conversion, will be a hot spot and it is possible to achieve greater breakthroughs. In the field of dialogue, semantic understanding and entity recognition directly based on speech signals are also underway. The value of alternative data such as satellites will be further integrated and mined.
Explainable AI and enhanced analysis: The emergence of interpretable AI and enhanced analysis will narrow the distance between data science and business, and provide businesses with more transparent and reliable AI capabilities.
5G, IOT and edge computing: China’s 5G will begin to land in 2019 and will be scaled up in 2020. Although there are currently fewer 5G core applications, with the simultaneous development of IoT, in the next 10 years, more terminal data can be collected and mined, and edge computing will be performed on the terminal.
Data calculation
Whether it is batch integration or TA integration, we will face more business scenario requirements and challenges, and more areas will usher in the automation and inclusive application of AI intelligent computing, cloud and end computing integration, etc. Actual business landing scenarios.
BI
Cloud BI will become the mainstream model of the market; demand for self-service data analysis will continue to be strong; new manufacturing, new finance, new retail, and small and medium-sized enterprises will become bright spots in the BI market; data analysis, data governance, and data asset management will have more intersections; massive Second-level response of data processing has become standard; AI and BI integration, BI will truly enter the era of intelligence; mobile, sharing, and embedded integration are becoming more and more common.
Cloud BI will become the mainstream model of the market: Cloud vendors + BI products will replace traditional private cloud solutions and become the mainstream model of the market. Cloud BI needs to have the capabilities of platform as a service and analytical application as a service, and can deploy, use, and manage data analysis reports and data analysis applications in the cloud and locally. Judging from the current international IT market, cloudification is indeed the general trend, and a large-scale market has gradually formed. However, due to the relatively closed data environment in the domestic market, there are also many challenges in data security. As a result, most of the key enterprise data still exists in privately deployed systems. The development progress of cloudification is not as good as that of the international market. The shining point of cloud BI in the Chinese market may lie in small and medium-sized users whose business is concentrated in the SAAS cloud platform system. This needs to be verified by the market.
The demand for self-service data analysis continues to be strong: The application scenarios of data analysis are constantly enriched and expanded, and more and more business personnel need to use data analysis to provide support for their business decisions, and enterprises need to liberate the labor force of IT personnel through self-service data analysis , Reduce business costs.
New manufacturing, new finance, new retail, and small and medium-sized enterprises will become bright spots in the BI market: From the perspective of social development trends, the concepts of new manufacturing, new finance, and new retail will usher in greater popularity. In these industries, “data The concept of “energy” has gradually become the consensus of the industry. Using BI to analyze data and fully tap the value of data has become their standard equipment. Small and medium-sized enterprises will also become a new bright spot in the BI market. Their application scenarios are mainly concentrated in the field of digital marketing. They urgently need to use data analysis to explore potential business value and help themselves complete business decisions.
Data analysis and data governance, data asset management will have more intersections. : In the next few years, more and more large enterprises will implement unified data governance and data asset management projects. Data analysis is an important part of data asset management, and the integration of the two will increase. Metadata management, master data management, data labeling, multidimensional data analysis, etc. need to be deeply integrated with BI, and a corresponding analysis model should be established on this basis.
Mass data processing and second-level response have become standard: Traditional relational databases cannot meet the data development needs of enterprises. Big data has gradually become standard for enterprises. BI products need to provide a powerful data computing and processing engine to reduce the cost of waiting time for enterprise data queries. Improve the efficiency of business data analysis, and seamlessly integrate and dock with the enterprise’s own big data platform.
With the integration of AI and BI, BI will truly enter the era of intelligence: In order to meet the needs of business personnel for self-service data analysis and automatic mining, BI products need to enhance the ability of automatic data mining on the basis of the existing data visualization and data analysis functions. Users can easily use the advanced analysis functions built into the platform.
Mobile, sharing, and embedded integration are becoming more and more common: With the improvement of common business systems such as ERP, OA, MES, and HIS, companies can range from dozens of IT systems to thousands of systems. New self-service BI needs to be able to interact with Multiple systems are integrated at the same time to comprehensively analyze the business data of the enterprise. The analysis pages created by different users of the big data BI platform can be easily shared with other members. At the same time, when the analysis users of the enterprise design the dashboard, they can reuse the charts, dimensions, indicators, etc. in the dashboard to support users to share the specified page with other department members, which facilitates interactive communication. In order to meet the needs of enterprise personnel for real-time office and information exchange. The big data BI platform also needs to support the sharing and viewing of analysis results on the mobile terminal, and support the data level drill penetration and linkage of the analysis results on the mobile terminal.
Data service
The data service field will have significant changes in four areas: federated learning to promote circulation, AutoML to improve performance, high-performance online data access, and data cloud service.
Federated learning promotes circulation: Data has always been a key factor restricting the development of smart services. With the rise of federated learning, this problem will be effectively improved; under the premise of ensuring data security, let data become inclusive energy. No matter the parallel mode or the vertical mode, it is conducive to the dissemination of data among different enterprises and different media, and the effect of data differentiation is enhanced.
AutoML improves efficiency: Data intelligence will gradually become popular, and AutoML will gradually advance to achieve that common supervised learning tasks can confidently select algorithms and optimize hyperparameters through available methods or methods that are not completely complete. AutoML will no longer Seen as an alternative to the machine learning toolbox, but as another tool included in it.
High-performance online data access: The demand for high-performance online analysis is very strong, and the development of query approximation and data approximation technology will be crucial.
Data cloud service: kubernetes has a trend of unification. Whether machine learning or data application development, cloud native is the future. After the data service Cloud Native, data engineers can focus on the field of data analysis and implant key data logic without paying attention to service logic. DevOps, and machine learning training deployment predictions can all be Cloud Native, which promotes efficient use of resources and platform independence. Both AutoML and traditional data services will be completely clouded.
Data Security
There will be significant changes in the field of data security in four areas: Regulatory compliance is still the biggest driving force for the development of enterprise data security and personal privacy data protection; data-centric data security systems will gradually be recognized; there will be no one in the short term Technical systems can solve all data security issues; new technologies and models for data security are constantly emerging, and the boundaries of the data security industry are showing a trend of continuous expansion and integration.
Regulatory compliance is still the biggest driving force to promote the development of corporate data security and personal privacy data protection: specific legislation and industry standards will also be released one after another, but data openness and data security have become “two sides of the same coin” and are also policies and laws of various countries The focus and difficulty.
The data security system centered on data will gradually be recognized: in the future, data security will become one of the core competitiveness of enterprises rather than cost, that is, those who can work harder, data security is done well, and more business opportunities can be obtained.
In the short term, there will not be a technical system that can solve all data security issues, but use different technologies to solve different security issues based on different scenarios: for example, sgx and secure multi-party computing can solve the problem of data fusion that multiple parties do not believe. , Edge computing on the end can solve the risk of collection compliance, differential privacy can solve some personal privacy data leakage problems, and intelligent algorithms can solve risk identification and control problems in the data circulation process.
The data security industry will usher in major opportunities: the development of the digital economy era strongly relies on the mining and application of big data as production materials. In this process, it is necessary to solve the problem of data islands and increase the commercial and social value of data resources.
The data ownership relationship will be more complicated: data protection needs are fully exploded, new technologies and models of data security are constantly emerging, and the boundaries of the data security industry are showing a trend of continuous expansion and integration.