Here, we formalize the decentralized data collaboration problem and discuss example workloads and required abstractions.
To revisit the background of the problem, check Background.
Definition of the problem
✏️ Decentralized data collaborations often involve two or multiple participants cooperatively performing certain tasks together for greater data insights.
Here, we discuss examples of decentralized data collaboration workloads and identify a set of useful, shared and required abstractions.
Example workloads
Many collaborations focus on private data access.
- Private set operation (e.g. intersection, union) allows two parties to compare sets and only reveals the result. Private set intersection is one of the most widely adopted collaboration types given its wide usage in contact discovery. There are many existing systems that target to solve private set intersections in the real world.
- Private information retrieval allows one user to access list entries from another user without revealing the index. Real-world solutions like XPIR~\cite{XPIR} provides a C++ API.
- Federated analytics and secure aggregation that aggregate statistics are also used in the real world, with Google's GBoard as an example.
There are also systems designed for more general and complicated collaboration goals.
- Encrypted databases and private query systems bring privacy to database access and query executions.
- Privacy-preserving machine learning allows multiple participants to train machine learning models (federated learning) and run a model inference (secure inference) with privacy guarantees.
In addition, there are also decentralized procedures that do not directly focus on gaining data insights but serve as building blocks for security or privacy.
- Secret sharing is a common building block for many multi-party computation-based protocols. In real-world systems, when combined with other techniques, it can be also used for storage.
- Anonymous communication network hides the internet traffic with overlays. Solutions like Tor and Mix Net have already been widely deployed in the real world.
Required abstractions
Going through examples of existing systems mentioned above, we identify the following list of shared abstractions that is often useful:
- Private Storage. Most above systems require abstractions on separate storage for different participants. Some (e.g. secure aggregation) require private variable stores in the memory, while others require persistent storage that is only accessible to certain participants.