Saturday, May 1, 2010

How to fit the whole Wikipedia inside your Laptop

Now that's 3,800,000 webpages in printed form!

What wikipedia would look like if all its pages were to be printed. Be careful not to drop it on someone, okay?

METHOD 1: Using WikiTaxi (

1.1. Download all 5.7 Gigabytes of the latest Wikipedia English webpages

This is the latest Wikipedia dump file (2010-Mar-16 08:44:40). If you want this file without having to download it for 5 to 8 hours, please get it from Mr Frankie (, who will be most happy to give to you for a fee of RM0.00.

1.2. Download the offline browser WikiTaxi at:
(look for the download link on the left)

1.3. Run the WikiTaxi importer executable file.
Browse for the bz2 file; specify the output file directory and filename (for example Now go and take a long teh-tarik break, around 2 hours or more (!!!) as WikiTaxi extracts the contents of the bz2 file and then recompresses them into your output filename.

1.4. After your long coffee break, start WikiTaxi.exe, and open a database.
WikiTaxi will display a random page and you can start browsing from here.

You can copy your wikitaxi files and (which will be a few Gigabytes, but still small enough to fit in your 16 Gigabyte thumbdrive) and carry it around. You could, of course, also copy these files to your laptop's or desktop's harddisk. Here's my output file after 2.5 hours of conversion by wikitaxi:

Sorry, as of Wikipedia's dump file, dated November 2009, the resultant *.taxi file is greater than 8 gigabyte. So you can't fit it inside a thumbdrive, unless you have a 16Gb thumbdrive. For the latest Wikipedia dated April 2010, the resultant *taxi file will be even larger in size.

After every few months (or is it weeks?!), you could repeat the whole process as Wikipedia gets updated, like, every hour. Just be prepared to take many long teh-tarik breaks!

Here's WikiTaxi in action. I did a search for "Ananda Krishnan" and got this screen:

There are, however, some drawbacks to WikiIndex1. No images (a pity, since illustrations, images and pictures on Wikipedia are excellent memory-aids and are part of the learning process)
2. WikiTaxi's client program (the one that runs on your PC/laptop) doesn't allow copy and paste - this I find very strange.

METHOD 2: Using Okawix

Okawix lets you download the whole content of Wikipedia, with or without images, so that you can browse it offline: Okawix is available in 253 languages and includes sister projects of the Wikimedia Foundation (Wikisource, Wiktionary, Wikiquote, Wikibooks)

After downloading and installing the software, click on the home icon (the red-roofed house icon, next to magnifying glass), and you will be presented with a menu of Wikis to download. Half of them entail downloads of more than a 100 Megabytes (the English Wikipedia is 6 Gigabytes!).

o far, I have managed to download WikiNews data and images (120 Megabytes). I haven't yet managed to download 6 Gigabytes worth of the English Wikipedia via this free open source software. If Okawix manages to download that, the resultant data must be very, very huge. The only way to do this is to do an overnight (weekend) download! That's my next project, so stay tuned. In the meantime, I will have to do with just the full html-only Wikipedia via WikiTaxi.

Lastly, I leave you with a screenshot of WikiNews data in the Okawix client on my laptop. Notice that it comes with images.

Last Words
If any student or staff is interested to fit the whole wikipedia inside your thumbdrive and browse Wikipedia offline, feel free to contact me: Frankie Kam,,, telephone: 6012-6585109, 606-2822613.

For more information, head over to:

Ratings and Recommendations by outbrain