File Sharing - 2nd Generation Design

On my page The Gnutella Network: Why it sucks, I explained why Gnutella sucks and everyone should adept 'my way' of doing things. There were some good things and some bad things in that particular rant. There have been some changes (improvements) to Gnutella, but it's still not quite what you want. Meanwhile, everyone who is using Windows is using KaZaA, a similar network which is fully encrypted, and supports downloading a file through multiple streams, considerably speeding up the downloads. It is really a great tool, although it does seem to have a minor problem with resuming downloads if you have been disconnected for a while (like overnight).

This rant is not about Gnutella nor KaZaA; it is about doing file sharing, and doing it another way than was mentioned before. When I was thinking about it, I figured that this could be the way how KaZaA works, although I honestly don't know and there is no way to check because its design is not open to the community. On the other hand, KaZaA talks to Grokster, and Grokster works much like Gnutella (or so I'm told).

So, what's new? I figured I just wanted to advertise my shared files on a network, and allow people to download my shared stuff. I could tell a friend 'the file you want is here' and he could pick it up. That's possible with ftp, nothing new there.
But what if I didn't know where to get a particular file? If I need information on a particular subject, I put it in Google (web search engine), and numerous locations pop out. A file sharing tool could work the same way; you advertise to an indexing service, and you can query the indexing service on where to find the file.

The indexing service must be decentralized. The indexing service can be on its own Gnutella-like network. A query that cannot be answered by an index server can propagate the query to the next index server. Instead of routing the answer back through the network, I suggest sending the answer back to the originator immediately. Another way would be telling the quering client what index servers there are, and let him query multiple servers, instead of letting the servers query each other. This creates more traffic on the client side, but simplifies the design and implementation of the index server.
The index server to connect to is a dynamic list of well known index servers that are usually running.
Because the index servers are on a relatively small network, it is possible to count the amount of files shared and the total amount of bytes.

All files that are shared have a checksum. The SHA-1 algorithm is currently in common use on peer-to-peer networks, but in fact, any good checksumming algorithm will do. Computing the checksum takes some time. I suggest the checksum is not computed until someone actually requests the file, and then saving the checksum locally as a cached copy.
If the index server finds a file of the same size and with the same checksum on another client machine, it is probably the exact same file that is being shared. This enables someone else to download the file from multiple sites at once, hence speeding up the download.

I figured this is probably much like the way KaZaA works, but then again, there is no way to be sure. I do know that this design would make for an excellent file sharing network.
Something I really miss in the current version of KaZaA is bandwidth regulation. I would like to be able to say "do not use more than 50 Kbps" or "do not use more than 80% of my total bandwidth". I'm told that some other peer-to-peer download tools do have this option. I would also like to be able to say "if the download rate drops below 15 Kbps, go search for more download sources". Or what about "automatically queue downloads if the line is already congested".

There are lots of peer-to-peer networks nowadays and plenty of download tools to choose from. There is still some room for improvement though, if only the developers used their imagination. Personally, I would like to see an implementation that sees the network as a globally shared filesystem, which you can mount and use like any other filesystem. This would require a completely different setup...
Anyway, I think it would still be fun to write my own implementation of a peer-to-peer download tool some day, when I have nothing useful to do ;-)


If you really must, you can contact the author at walter at heiho dot net