Peer-to-peer networking is an alternative approach to network communication. In order to understand how P2P differs from the “standard” approach to network communication it is necessary to take a step backward and look at client-server communications. Client-server communications are ubiquitous in networked applications today.
Traditionally, you interact with applications over a network (including the Internet) using a client-server architecture. Web sites are a great example of this. When you look at a Website you send a request over the Internet to a Web server, which then returns the information that you require. If you want to download a file, you do so directly from the Web server.
Similarly, desktop applications that include local or wide area network connectivity will typically connect to a single server, for example, a database server or a server that hosts other services.
Thus simple form of client-server architecture is illustrated in Figure 47-1.
There is nothing inherently wrong with the client-server architecture, and indeed in many cases it will be exactly what you want. However, there is a scalability problem. Figure 47-2 shows how the client-server architecture scales with additional clients.
With every client that is added an increased lad is placed on the server, which must communicate with each client: To return to the Web site example – this is how Web sites collapse. When there is too much traffic the server simply becomes unresponsive. .
There are of course scaling options that you can implement to mitigate this situation. You can scale up by increasing the power and resources available to the server, or you can scale out by adding additional servers. Scaling up is of course limited by the technology available and the cost of better hardware. Scaling out is potentially more flexible, but requires an additional infrastructure layer to ensure that clients either communicate with individual servers or that clients can maintain session state independent of the server with which they are communicating. Plenty of solutions are available for this, such as Web or server farm products.
The peer-to-peer approach is completely different from either the scaling up or scaling out approach. With P2p, instead of focusing on and attempting to streamline the communication between the server
and its clients, you instead look at ways in which clients can communicate with each other.
Say, for example, that the Web site that clients are communicating with is csharpaid.com. In our imaginary scenario, Csharp has announced that a new version of this book is to be released on the csharpaid.com web site and will be free to download to anyone who wants it, but that it will be removed after one day. Before the book is available on the Web site you might imagine that an awful lot of people will be looking at the Web site and refreshing their browsers, waiting for the file to appear. Once the file is available, everyone will try to download it at the same time, and more than likely the csharpaid.com Web server will collapse under the strain.
You could use P2P technology to prevent this Web server collapse from occurring. Instead of sending the file directly from the server to all the clients, you send the file to just a few clients. A few of the remaining clients then download the file from the clients that already have it, a few more clients download it from those second-level clients, and so on. In fact, this process is made even faster by splitting the file into chunks and dividing these chunks between clients, some of whom download it directly from the server, and some of whom download chunks from other clients. This is how file-sharing technologies such as BitTorrent work, and is illustrated in Figure 47-3 .
P2P Architectural Challenges
There are still problems to solve in the file-sharing architecture discussed here. For a start, how do clients detect that other clients exist, and how do they locate chunks of the file that other clients might have? Also, how can you ensure optimal communication between clients that may be separated by entire continents?
Every client participating in a P2P network application must be able to perform the following operations to overcome these problems:
- It must be able to discover other clients.
- It must be able to connect to other clients.
- It must be able to communicate with other clients.
The discovery problem has two obvious solutions. You can either keep a list of the clients on the server so clients can obtain this list and contact other clients (known as peers), or you can use an infrastructure for example PNRP, covered in the next section) that enables clients to find each other directly. Most file sharing systems use the “list on a server” solution, by using servers know as trackers. Also, in file sharing systems any client may act as a server as shown in Figure 47-3,by declaring that it has a file available and registering it with a tracker. In fact, a pure P2P network needs no servers at all, just peers.
The connection problem is a more subtle one, and concerns the overall structure of the networks used by a P2P application. If you have one group of clients, all of which can communicate with one another, the topology of the connections between these clients can become extremely complex. You can often improve performance by having more than one group of clients, each of which consists of connections between clients in that group, but not to clients in other groups. If you can make these groups locale-based you will get an additional performance boost, because clients can communicate with each other with fewer hops between networked computers.
Communication is perhaps a problem of lesser importance, because communication protocols such as TCPlIP are well established and can be reused here. There is, however, scope for improvement in both high-level technologies (for example, you can use WCF services and therefore all the functionality that WCF offers) and low-level protocols (such as multicast protocols to send data to multiple endpoints simultaneously).
Discovery, connection, and communication are central to any P2P implementation. The implementation you look il:t.inthis chapter is to use the System. Net. Peer T oPeer types with PNM for discovery and PNRP for connection. As you see in subsequent sections, these technologies cover all three of these operations.
In the previous sections you were introduced to the concept of a peer, which is how clients are referred to in a P2P network. The word “client” makes no sense in a P2P network because there is not necessarily a server to be a client of.
Groups of peers that are connected to each other are known by the interchangeable terms meshes, clouds, or graphs. A given group can be said to be well-connected if:
- There is a connection path between every pair of peers, so that every peer can connect to any other peer as required..
- There are a relatively small number of connections to traverse between any pair of peers.
- Removing a peer will not prevent other peers from connecting to each other.
Note that this does not mean that every peer must be able to connect to every other peer. In’fact, if you analyze a network mathematically you will find that peers need to connect only to a relatively small number of other peers in order for these conditions to be met.
Another P2P concept to be aware of is that of flooding. Flooding is the way in which a single piece of data may be propagated through a network to all peers, or of querying other nodes in a network to locate a specific piece of data. In unstructured P2P networks this is a fairly random process of contacting nearest neighbor peers, which in turn contact their nearest neighbors, and so on until every peer in the network is contacted. It is also possible to create structured P2P networks such that there are well-defined pathways for queries and data flow among peers.
Once you have an infrastructure for P2P you can start to develop not just improved versions of client-server applications, but entirely new applications. P2P is particularly suited to the following:
- Content distribution applications, including the file-sharing applications discussed earlier.
- Collaboration applications, such as desktop sharing and shared whiteboard applications.
- Multi-user communication applications that allow users to communicate and exchange data directly rather than through a server.
- Distributed processing applications, as an alternative to super computing applications that process enormous amounts of data.
- Web 2.0 applications that combine some or all of the above in dynamic next-generation Web