The WEF has published a very interesting white paper describing various privacy-enhancing techniques. You can find the document under https://www.weforum.org/whitepapers/the-next-generation-of-data-sharing-in-financial-services-using-privacy-enhancing-techniques-to-unlock-new-value. The document presents various related techniques and their application to financial services
These techniques represent solutions to the confidentiality and privacy issues when willing to share sensitive personal information with 3rd parties.
We will focus on federated learning in this short article.
One model is of particular interest, the federated analysis model. With this technique, we aggregate learned models together to form a common model. The concept is described on page 12 of the white paper where the spam detection case is used to demonstrate the usage of such concept. This approach is not without issues, models have the tendency to become outdated very fast if they are not refreshed with new live data. This means that models would have to be retrained and combined again on regular basis. This means high maintenance and creates a risk of inaccurate results.
With this approach, personal data is never shared by any model contributor, but only the resulting machine learning model. This is one of the most common misunderstanding about machine learning models. Training data is not included in the trained model, hence this is not an issue in terms of data privacy. If we take the example of a neural network, it will only include weights associated to each neuron, but not the data itself.
We want to go beyond this relative static approach and instead of sharing learned model, we want to implement a single model shared across different participants. Such a collaborative approach based on blockchain has been proposed by Microsoft researchers under https://www.microsoft.com/en-us/research/blog/leveraging-blockchain-to-make-machine-learning-models-more-accessible/. We do not think we need blockchain to implement the concept. But if we want to have a fully open platform including a way of incentivizing contributors and make the model easily accessible, blockchain can propose a good solution for that.
Each participant would contribute to the model, train and improve it. This has a major advantage to other techniques. Participants can use the model, add new data and improve the model in real-time. It also allows interested participants to have access to models trained with lots more data that they would be able to collect. This is very attractive to small institutions for example or for larger companies that would not have access to specific data.
The technology underlying the model is already well known. We can think about deploying the model on the cloud using serverless architecture. It is then a matter of publishing simple APIs to train the model with new data as well as use the model sending new data to be predicted or classified. Azure Machine Learning Studio provides an easy way to deploy models through REST APIs.
In order for such a model to work, we must find incentives for each participant of the Ecosystem. As Microsoft proposed, we can think of rewarding a contributor when specific metrics of the model are improved, for example accuracy or precision or AUC. Each user of the model would pay a fee for every call to the model as it is done in usual cloud-based services.