Export purchased books list from Amazon

Sunday, September 18th, 2011 | Author:

If you happened to buy books from Amazon.com (or, in my case, Amazon.de) and maybe used the recommendation engine and the wishlist (and and and ...) then there will be lots of data about your books on the Amazon website. Have you ever thought about organizing your library with a different tool? May it be Google Books or LibraryThing or Shelfari, you will have to export this precious big amount of data from Amazon to the other service. Luckily, some intelligent people invented ISBN, so you basically need to extract a list of ISBNs to identify the books (neglecting your reviews and tags for now). Not that luckily, Amazon doesn't offer such export functionality to the layman. Searching the internet yields a Greasemonkey script that enables you to export wishlist content - but no ISBNs, so import into other services is not so easy.

The solution is to save each website of "your purchases" (or other such lists of books) as HTML file and let a smart script do the extraction work. This way, you're not violating Amazon's terms of service (which most likely don't allow any robots scraping the website) and on the positive side, it works.

Here is my python script, which you can also download here (in a better version):
import sys, re
asinRegExString = "<tr valign=middle id=\"iyrListItem([A-Z0-9]{10})\">"
asinRegEx = re.compile(asinRegExString)
filename = sys.argv[1]
f = open(filename,'r')
asinlist = []
for line in f.readlines():
    match = asinRegEx.match(line)
    if match != None:
        asinlist+=[match.group(1)]
f.close()
print "\n".join(asinlist)

To run this script, you need a Python interpreter. On most common GNU/Linux systems, those are installed or easily installable, for example by "apt-get install python" on Debian-based systems.

I have tested it with Amazon.de and the "purchased books" website but I guess it would work equally well with Amazon.co.uk and Amazon.com. As always, leave a comment if it worked for you or not. If it doesn't work or if you have different needs (like, extracting ISBN and name and price) this will be easily possible by altering the regular expressions in the script (easy for a programmer, not that easy for anyone else).


I used this to import all books I bought via Amazon into my Google Books library which I use to maintain a list of all books I own. The nice thing about Google Books, on the other hand, is their XML export feature, which I commented on earlier.


Category: English

You can leave a response.

 

Related Articles

  • Managing papersManaging papers
    I have lots of PDFs on my hard-disk, and most of them is half-read or unread. Since I'm studying mathematics, these PDFs are lecture notes, research papers, my own notes and several more-or-less re...
  • Managing newsManaging news
    It is vital to get at least some news. You need to stay informed. How to cope with this information overload? Software can be used to control the news flood. Feeds offer interesting ways to organis...
  • Managing musicManaging music
    Once you moved your Vinyl/CD collection to your computer, to put them on your MP3 player, you need to keep them in order. There are tools for ID3 tagging.
  • Managing timeManaging time
    I tell how I use computers to help me in managing my time. This involves web-calendars, to-do-lists and Outlook. Web 2.0 and mobile internet have changed things.
  • Managing contactsManaging contacts
    Managing contacts has never been easy - there are various kinds of data floating around and the data is always changing. It's better to keep all contacts in one place and to take back-ups of your p...

 

3 Responses

  1. 1
    Konrad 

    Oh, it turns out LibraryThing does it like me:
    http://www.librarything.com/wiki/index.php/Adding_and_importing_books

    so probably you can just sign up there and then (hopefully, haven't been there) import from Amazon and export to ISBN. Anyway, with the script I wrote there is no need to create an account at LibraryThing.

  2. 2
    Cybernetic1 

    It doesn't work for the "I-own-it" list because python has to log into Amazon first (My browser having logged in Amazon is independent of it).

    I tried to use Trill / Mechanize to log into Amazon but their code is currently buggy and cannot parse Amazon's HTML correctly.

    Any help would be appreciated -- I have 65 pages of owned books that I want to export! :)

  3. I don't quite understand your comment "python has to log into Amazon first". My solution really is to download the relevant HTML files by hand, while being logged in with your browser. Only after that you will be able to use my python script.

    Downloading the 65 pages by hand sounds painful, so in that case you might want to write a Greasemonkey script to do that. On the other hand, I would most certainly do this by hand.

Leave a Reply

CAPTCHA Image
 (required)