Data Sharing Through Homomorphic Encryption & Federal Learning

CyberVein
7 min readFeb 21, 2020

--

In the past ten years, our society has become accustomed to “free” Internet services. Free search engine, free email and free online videos, these kinds of transactions seem to be cost-effective when the value that data can bring is unclear. At that time, users did not hesitate to trust the companies that stored their information and enjoyed the online experience provided by the companies. However, with the rise of artificial intelligence and the massive improvement of data analysis capabilities, targeted marketing, location-based search, and personalized promotions have gradually become new battlefields for data applications. Various data of the users are continuously integrated, user behavior is continuously refined, and personal health risks and election choices have become more predictable ……. However, as data being regarded as the “new oil” that promotes growth and innovation, it certainly violates the privacy of the users to an extent.

For example, in April 2019, the Amazon smart speaker Echo was exposed to privacy issues; in July of the same year, the Belgian Broadcasting Corporation (VRT) also revealed that Google hired contractors to listen to recordings of conversations between the users and Google smart speakers.

These privacy concerns meant there was public enthusiasm for free digital services, and eventually they turned into aversion of large tech companies and data sharing practices. This lead to the emergence of data protection, which meant that people’s trust in institutions was at the lowest level in history. The public’s voice for data privacy legislation was getting higher and higher.

Data Privacy Legislation

On April 27, 2016, Europe passed the General Data Protection Regulation (GDPR), which came into effect on May 25, 2018. It states that EU consumers will have the right to know which of their data has been kept by users data driven companies and the right to request that data to be deleted. Companies that violate the rules may face a fine of up to 4% of their annual revenue.The regulations give EU citizens a new set of privacy rights, and the GDPR is regarded as the Bible in the field of data privacy protection. Since its implementation in May 2018, the number of fines issued for violating the GDPR has reached $126 million, the largest of which was a 50 million euro fine issued by the French government to Google.

A legal system that protects the privacy of data is indeed important, but it also resulted in some unexpected consequences. The problem with data privacy regulations is that it limits the way organizations handle data, restricts collaboration in various fields, and has an adverse impact on the economy. After all, collaboration and division of labor are the roots of human progress. The regulations directly affected the business of companies related to the data industry. For example, due to the GDPR, Tencent’s QQ International Edition no longer provided services to the European users after May 20, 2018, and has withdrawn from the European market; also at the time, Google CEO Sundar Pichai warned the public: Due to regulations, Android may no longer be free, and the distribution model may become like its rival Apple; In May 2018, the National Institutes Of Health’s research on type 2 diabetes was suspended, the reason being that, according to the GDPR, the European patient data was not allowed to be made available to US researchers. Collaborators could not obtain the shared data, large-scale data sets were not effectively used, hence the medical technology development cannot progress. It was the patients who end up paying the price who might not be able to wait for new technologies to emerge and lose their lives. The introduction of relevant data security protection measures has made some services unavailable to us.

The data privacy regulations hindered development progresses, companies lost their ability to exchange knowledge, cooperation and communication are restricted. The promulgation of laws will inevitably limit the development of society and industries to a certain extent, but organizations have found ways to utilize user data while obeying the data privacy protection rules.

Homomorphic Encryption

Homomorphic encryption theory was first introduced in 1978, but was not until 2009 that an IBM researcher designed a truly comprehensive homomorphic encryption system. The encrypted data can be subjected to any operation as it is with plain data without the need to decrypt the encrypted data, so that the encrypted information can still be analyzed in depth without affecting its confidentiality, only revealing the final result of the processing. Through this breakthrough, people can trust a third party to process the data without revealing any privacy sensitive information.

The use of homomorphic encryption can alleviate the security issues of highly sensitive data sharing and promote valuable progresses such as medical developments.

Federal Learning

The concept of federal learning was originally proposed by Google in 2016. It allows participants to jointly build models in the local data vicinity and use the data resources within the entire data federation to improve the performance of each member’s model, without disclosing the underlying data. This has helped to solve data privacy problems and the problem of data silos.

Compared with the traditional learning model, the advantages of federal learning are obvious: 1. In the framework of federal learning, the participants are equal in status and can achieve fair cooperation; 2. Data is kept locally to avoid data leakage and meets user privacy protection and data security requirements; 3. Ensuring that the participating parties can exchange information and model parameters while maintaining independence, and grow at the same time; 4. The modeling effect is not so different from that of traditional deep learning algorithms; 5. Federal learning is a closed-loop learning mechanism, and the effectiveness of the model depends on the contribution of the data provider.

In the traditional method, users are just bystanders of artificial intelligence, but in the federal learning scenario, everyone is a participant in the development of artificial intelligence. When information can be shared and analyzed without being exposed, we don’t need to give up collaboration for the sake of trust. When collaboration under data sharing becomes more frequent and our imagination can be released again, then the discovery of solutions to some major problems of the world is only a matter of time.

Federated Learning Application

CyberVein has taken the opportunity provided by the federated learning technology and began research in the technology’s application to bring benefit to the world. The research and the development of the technology is done in Zhejiang University-CyberVein R&D Center, where the researchers has been exploring the various fields that federated learning can be applied.

CyberVein R&D Center has been researching the application of federated learning in the healthcare sector, specifically in the diagnosis of keratitis. By using the data stored in the blockchain, the researchers were able to use federated learning to apply to data from the different hospitals into an algorithm model, constructing an algorithm that is able to diagnosis the different types of keratitis in patients, without compromising the privacy of the patients’ or hospitals’ data. Different types of keratitis caused by bacteria, fungi and virus have subtle visual differences between them, making it hard to diagnosis them correctly with the naked eye and determining the correct treatment plan for the patients. If there were major faults, the patients may become blind. The model that was tested achieved an diagnosis accuracy rate of 80%, which was better than 96% of the doctors that had volunteered to help in the experiment. This achievement can be improved further, but it also meant that such method can be applied in other aspects in the healthcare, supporting the doctors’ work and increasing the patients’ chance of recovery greatly.

Just one success will not be enough, and the CyberVein R&D Center researchers are working to achieve other kinds of federated learning applications in other sectors. The potential field that can be explored included aerospace, where the technology is explored for its use with the sensitive data that is stored in the satellites. Another area that can be explored is the surveillance and security, where the technology is being experimented to find its use with data collected through video surveillance cameras.

Data sharing technologies that comply with the users data privacy protection policies such as GDPR will certainly push the collaboration and development of many businesses in various sectors. The technologies has resolved the issue of users’ trust in data related organizations wanting to protect their personal data from misuse. Currently, not all organizations have adopted such technology, some user data will still be compromised…but in the future, this will most definitely become a standard for organizations involving in sensitive data.

--

--

CyberVein
CyberVein

Written by CyberVein

CyberVein reinvents decentralized databases and the way we secure and monetize information.

No responses yet