Ben is a lifelong Nintendo fan who likes to build websites, and make video games. He buys way too much Lego.
A couple of days ago I read an article about the MillionDollarHomepage and how the content on it has largely vanished. This raised the question of permanence on the web. The article also mentioned how it would be an interesting project to use something like the Wayback Machine to restore the MillionDollarHomepage to its former ‘glory’.
Given the existence of powerful and widely accessible tools such as the Wayback machine, this kind of restorative curation may well be within reach.
A quick detour
I mentioned this article to my wife – someone who has been using the internet at least as long as I have (late 90’s) and she said she hadn’t heard of the MillionDollarHomepage. If this is you, then here’s a quick recap.
The MillionDollarHomepage was the brain child of an English student. He decided to sell pixels on a website to help cover his university fees. Each pixel cost a dollar, and they were sold in blocks on 10×10 (so $100 a block). Customers could buy multiple blocks next to each other and put an image in those blocks. The block would link wherever the customer wanted.
So anyway – I liked the idea of restoring the site so that the old links worked again, but going through the 2800 sites and replacing their urls sounded like a pain – so I had a look at the Wayback Machine – and found a super simple way to get the pages to load.
Here’s a link to my website from May 2016:
If you look at the url it’s pretty obvious what most of it does. It starts with the Wayback Machine url. Then it has a series of numbers, then the website that is being viewed. The series of numbers is the interesting bit.
The numbers contain the date (2016/05/01) and time (01:07:52) the website snapshot was taken – but each snapshot is taken at a different time – so if I used this I would still have to go through each url and find a working snapshot. I wondered what would happen if I tried removing the parts of the url – and instantly I had my solution.
If I removed the time, then the Wayback Machine would skip forward to the first snapshot after the specified date. Even better, if I enter a different date then it would skip forward to the first snapshot after that date.
To restore the MillionDollarHomepage all I had to do was prefix all of the urls with a WaybackMachine url. I chose the date August 1st 2005 since this was roughly when the MillionDollarHomepage was first launched – then did a simple search and replace on the page to insert the url.
I am hosting the restored MillionDollarHomepage on Github pages – and you can view the code on Github.
… I found is that some pages are not archived with the Wayback Machine. They respect robots.txt files, and other content blockers – so some sites can’t be viewed. Not sure what to do about this at the moment, or if there’s an automated way to find and replace the ones that don’t work.
To be honest, whilst was aware of the project, I had never actually looked at any of the links on it, so in testing I was amazed at how many of the links were for either spammy things that are the same things we get junk mail for, or rip off versions of the Million Dollar Homepage. There is even a script (possibly more than one) that lets you create your own imitation MillionDollarHomepage.
It did make me wonder if the originator of the project had considered the ethics of the project – but then I guess he wouldn’t have earned over a million dollars putting together a super simple website.
I was also amazed at the number of links that don’t actually link anywhere. There’s a lot that appear to have been paid for and then never completed. They link to places like http://paid + reserved or http://pending order which just seems strange to me. Why not make use of something you have paid for? Anyway – I changed all these links to point to #reserved.
Theres also a suspended link. What on earth did they do?
I think it has taken me longer to write this article than it took me to work out how to restore the homepage and get it online. It probably took me 10 minutes altogether. To be honest it feels like it should have taken longer. Perhaps one day I will go through all the pages and find all the dead ends. Perhaps there’s somewhere else I could link instead. It also doesn’t take into account the sites that still exist – but I felt this was worthwhile since the idea was to get a snapshot of how the site would have looked 10 years ago.