Web-scraping AbeBooks.com (Reverse Engineering a REST API)

 

Motivation

I have a large collection of electronic books, which I manage using Calibre. Using Calibre’s “Extract ISBN” plugin, I am able to parse the ISBN identifier from most of my files, which then makes fetching the rest of the metadata very easy. (Below is an example of my library’s metadata.)

Thus, I have access to a very convenient and ever-growing virtual library of books, which I like to use on the go, and for exploratory research. Nevertheless, whenever I find a particularly good book, the thing that I want most, is to own a physical copy.

Enter here AbeBooks.com. Next to Amazon, and occasionally Ebay, it is my go-to site for buying cheap used textbooks. Given that I have stored the ISBN data for most of my electronic books, I would like to be able to automatically fetch pricing information for any book in my virtual library, perhaps even keeping track of changes in price over time.

However, until now, the main problem stopping me from writing a script to do this was that AbeBooks does not have a publicly available API… or at the very least, none that is explicitly documented.

REST APIs

REST, or Representational State Transfer, is an architecture, or convention used by the HTTP protocol to provide interoperability between servers. It is based on a request/response system, where a request is a “payload”, normally formatted as HTML, XML, or JSON., and the response can be a link to a resource, a data payload in any of the aforementioned formats, or a confirmation that some data was modified in the server.

Several common REST methods exist: GET, HEAD, POST, PUT, PATCH, DELETE, CONNECT, OPTIONS and TRACE. Among these, the two most common are GET and POST:

GET

  • Used to request data from a server.
  • Parameter data is stored in the URL of the query as string parameters
  • Number of parameters is limited to the length that can fit in the URL
  • Not secure for sensitive information. (Passwords can be easily seen)

POST

  • Used to submit data to a server, and can modify server contents.
  • Parameters are passed in the message body, rather than the URL.
  • It has no restrictions on the number of parameters.
  • Is more secure for sending sensitive information.

Exploring Network Packets

I found that inspecting the network packets for an AbeBooks search results page is simple, and yields promising results. If we open Firefox’s developer tools, under the Network tab, we can see a list of all the packets that are loaded. In particular we are interested in those that have a JSON response, highlighted in red below:

We can see that there are four POST requests, to a service called “pricingservice”, and one GET request to a “RecomendationsApi”.

If we look more closely at one of the POST requests, we can see which parameters it takes in:

ISBN! Just what we needed! Furthermore, looking at the response tab, we can see that this request returns the prices for new and used books, among other things:

Wrapping the API in Python

Now that we know a bit more about how AbeBooks works under the hood, we can start implementing our API wrapper in Python. We will need the requests module:

import requests

Sending POST requests

The first REST method that we will implement is the POST method that fetches prices for a given book. From inspecting the page elements, we know that the URL for this service is:

url = "https://www.abebooks.com/servlet/DWRestService/pricingservice"

There seem to be three main parameter groups, and we can infer their purpose. (Parameters shown in bold below are to be replaced by user values)

Searching prices by ISBN:

Parameter Value
action getPricingDataByISBN
isbn isbn
container pricingService-isbn

Searching prices by title and author:

Parameter Value
action getPricingDataForAuthorTitleStandardAddToBasket
an author
tn title
container oe-search-all

Searching prices by title, author, and hardcover/softcover binding:

Parameter Value
action getPricingDataForAuthorTitleBindingRefinements
isbn 9781250297662
an author
tn title
container priced-from-soft OR priced-from-hard

The parameters can be stored as a dictionary, and sent to the request’s post method. For example:

#- Search prices by ISBN
payload1 = {'action': 'getPricingDataByISBN',
           'isbn': 9781250297662,
           'container': 'pricingService-9781250297662'}

#- Search prices by author and title
payload2 = {'action': 'getPricingDataForAuthorTitleStandardAddToBasket',
            'an': 'liu ken',
            'tn': 'broken stars',
            'container': 'oe-search-all'}

#- Sending a request
resp = requests.post(url, data=payload1)
print(resp.status_code, resp.reason)
resp.json()

The response is:

200 OK


{'errorTexts': [None],
 'errorCodes': [None],
 'success': True,
 'newExists': True,
 'usedExists': True,
 'pricingInfoForBestNew': {'bestListingid': 30410510568,
  'totalResults': 16,
  'bestPriceInPurchaseCurrencyWithCurrencySymbol': 'US$ 7.26',
  'bestPriceInSurferCurrencyWithCurrencySymbol': 'US$ 7.26',
  'domesticShippingPriceInPurchaseCurrencyWithCurrencySymbol': 'US$ 4.50',
  'shippingToDestinationPriceInPurchaseCurrencyWithCurrencySymbol': 'US$ 6.00',
  'shippingToDestinationPriceInSurferCurrencyWithCurrencySymbol': 'US$ 6.00',
  'shippingDestinationNameInSurferLanguage': 'U.S.A.',
  'vendorCountryNameInSurferLanguage': 'Canada',
  'vendorId': 71361,
  'bestPriceInPurchaseCurrencyValueOnly': '7.26',
  'bestShippingToDestinationPriceInPurchaseCurrencyValueOnly': '6.0',
  'listingCurrencySymbol': 'US$',
  'purchaseCurrencySymbol': 'US$',
  'nonPaddedPriceInListingCurrencyValueOnly': '7.26',
  'refinementList': None,
  'internationalEdition': False,
  'bookCondition': 'New',
  'bookDescription': 'Hardcover. Publisher overstock,...',
  'freeShipping': False},
 'pricingInfoForBestUsed': {'bestListingid': 30529767259,
  'totalResults': 8,
  'bestPriceInPurchaseCurrencyWithCurrencySymbol': 'US$ 6.55',
  'bestPriceInSurferCurrencyWithCurrencySymbol': 'US$ 6.55',
  'domesticShippingPriceInPurchaseCurrencyWithCurrencySymbol': 'US$ 3.99',
  'shippingToDestinationPriceInPurchaseCurrencyWithCurrencySymbol': 'US$ 3.99',
  'shippingToDestinationPriceInSurferCurrencyWithCurrencySymbol': 'US$ 3.99',
  'shippingDestinationNameInSurferLanguage': 'U.S.A.',
  'vendorCountryNameInSurferLanguage': 'U.S.A.',
  'vendorId': 71597499,
  'bestPriceInPurchaseCurrencyValueOnly': '6.55',
  'bestShippingToDestinationPriceInPurchaseCurrencyValueOnly': '3.99',
  'listingCurrencySymbol': 'US$',
  'purchaseCurrencySymbol': 'US$',
  'nonPaddedPriceInListingCurrencyValueOnly': '6.55',
  'refinementList': None,
  'internationalEdition': False,
  'bookCondition': 'As New',
  'bookDescription': 'Like brand new book.',
  'freeShipping': False},
 'pricingInfoForBestAllConditions': None,
 'isbn': '9781250297662',
 'totalResults': 24,
 'containerId': 'pricingService-9781250297662',
 'refinementList': [{'name': 'collectibleJacket',
   'label': 'Dust Jacket',
   'count': 2,
   'url': 'dj=on&isbn=9781250297662&sortby=17'},
  {'name': 'freeShipping',
   'label': 'Free US Shipping',
   'count': 9,
   'url': 'isbn=9781250297662&n=100046078&sortby=17'},
  {'name': 'bindingHard',
   'label': 'Hardcover',
   'count': 23,
   'url': 'bi=h&isbn=9781250297662&sortby=17'},
  {'name': 'collectibleFirstEdition',
   'label': 'First Edition',
   'count': 3,
   'url': 'fe=on&isbn=9781250297662&sortby=17'}],
 'bibliographicDetail': {'author': '', 'title': ''}}

Sending a GET request

The API also has a GET method for obtaining book recommendations given an ISBN. The url and parameter names are different, but the way we send the request is very similar:

url = "https://www.abebooks.com/servlet/RecommendationsApi"
Parameter Value
pageId plp
itemIsbn13 isbn
#- Get book recommendations by ISBN
payload = {'pageId': 'plp',
           'itemIsbn13': 9781250297662}

resp = requests.get(url, params=payload)
print(resp.status_code, resp.reason)
resp.json()

Response:

200 OK


{'widgetResponses': [{'slotName': 'detail-1',
   'title': 'Customers who bought this item also bought',
   'algoName': 'abeBooksBlendedPurchaseSims',
   'ref': 'pd_b_p_1',
   'recommendationItems': [{'attributes': [],
     'thumbNailImgUrl': 'https://pictures.abebooks.com/isbn/9780765384201-us-300.jpg',
     'itemLink': '/products/isbn/9780765384201?cm_sp=rec-_-pd_b_p_1-_-plp&reftag=pd_b_p_1',
     'subTitle': None,
     'isbn13': '9780765384201',
     'title': 'Invisible Planets: Contemporary Chinese Science Fiction...',
     'author': 'Liu, Ken'},
    {'attributes': [],
     'thumbNailImgUrl': 'https://pictures.abebooks.com/isbn/9781250306029-us-300.jpg',
     'itemLink': '/products/isbn/9781250306029?cm_sp=rec-_-pd_b_p_1-_-plp&reftag=pd_b_p_1',
     'subTitle': None,
     'isbn13': '9781250306029',
     'title': 'The Redemption of Time: A Three-Body Problem Novel...',
     'author': 'Baoshu'},
    {'attributes': [],
     'thumbNailImgUrl': 'https://pictures.abebooks.com/isbn/9780765389312-us-300.jpg',
     'itemLink': '/products/isbn/9780765389312?cm_sp=rec-_-pd_b_p_1-_-plp&reftag=pd_b_p_1',
     'subTitle': None,
     'isbn13': '9780765389312',
     'title': 'Waste Tide',
     'author': 'Qiufan, Chen'},
    {'attributes': [],
     'thumbNailImgUrl': 'https://pictures.abebooks.com/isbn/9780765384195-us-300.jpg',
     'itemLink': '/products/isbn/9780765384195?cm_sp=rec-_-pd_b_p_1-_-plp&reftag=pd_b_p_1',
     'subTitle': None,
     'isbn13': '9780765384195',
     'title': 'Invisible Planets: Contemporary Chinese Science Fiction...',
     'author': 'Liu, Ken'},
    {'attributes': [],
     'thumbNailImgUrl': 'https://pictures.abebooks.com/isbn/9781784978518-us-300.jpg',
     'itemLink': '/products/isbn/9781784978518?cm_sp=rec-_-pd_b_p_1-_-plp&reftag=pd_b_p_1',
     'subTitle': None,
     'isbn13': '9781784978518',
     'title': 'The Wandering Earth',
     'author': 'Liu, Cixin'}]},
  {'slotName': 'ext-search-detail-1',
   'title': None,
   'algoName': 'heroWidgetIsbnSims',
   'ref': 'pd_hw_i_1',
   'recommendationItems': [{'attributes': [],
     'thumbNailImgUrl': 'https://pictures.abebooks.com/isbn/9780804172448-us-300.jpg',
     'itemLink': '/products/isbn/9780804172448?cm_sp=rec-_-pd_hw_i_1-_-plp&reftag=pd_hw_i_1',
     'subTitle': 'Best Selling',
     'isbn13': '9780804172448',
     'title': 'Station Eleven',
     'author': 'Mandel, Emily St. John'},
    {'attributes': [],
     'thumbNailImgUrl': 'https://pictures.abebooks.com/isbn/9781786073495-us-300.jpg',
     'itemLink': '/products/isbn/9781786073495?cm_sp=rec-_-pd_hw_i_1-_-plp&reftag=pd_hw_i_1',
     'subTitle': 'Top Rated',
     'isbn13': '9781786073495',
     'title': 'Zuleikha',
     'author': 'Yakhina, Guzel'}]}]}

An Object-Oriented Module

I created a small Python module abebooks.py to encapsulate the requests. The full code is below:

import requests


class AbeBooks:

    def __get_price(self, payload):
        url = "https://www.abebooks.com/servlet/DWRestService/pricingservice"
        resp = requests.post(url, data=payload)
        resp.raise_for_status()
        return resp.json()

    def __get_recomendations(self, payload):
        url = "https://www.abebooks.com/servlet/RecommendationsApi"
        resp = requests.get(url, params=payload)
        resp.raise_for_status()
        return resp.json()

    def getPriceByISBN(self, isbn):
        """
        Parameters
        ----------
        isbn (int) - a book's ISBN code
        """
        payload = {'action': 'getPricingDataByISBN',
                   'isbn': isbn,
                   'container': 'pricingService-{}'.format(isbn)}
        return self.__get_price(payload)

    def getPriceByAuthorTitle(self, author, title):
        """
        Parameters
        ----------
        author (str) - book author
        title (str) - book title
        """
        payload = {'action': 'getPricingDataForAuthorTitleStandardAddToBasket',
                   'an': author,
                   'tn': title,
                   'container': 'oe-search-all'}
        return self.__get_price(payload)

    def getPriceByAuthorTitleBinding(self, author, title, binding):
        """
        Parameters
        ----------
        author (str) - book author
        title (str) - book title
        binding(str) - one of 'hard', or 'soft'
        """
        if binding == "hard":
            container = "priced-from-hard"
        elif binding == "soft":
            container = "priced-from-soft"
        else:
            raise ValueError(
                    'Invalid parameter. Binding must be "hard" or "soft"')
        payload = {'action': 'getPricingDataForAuthorTitleBindingRefinements',
                   'an': author,
                   'tn': title,
                   'container': container}
        return self.__get_price(payload)

    def getRecommendationsByISBN(self, isbn):
        """
        Parameters
        ----------
        isbn (int) - a book's ISBN code
        """
        payload = {'pageId': 'plp',
                   'itemIsbn13': isbn}
        return self.__get_recomendations(payload)

Using the AbeBooks Module

from abebooks import AbeBooks

ab = AbeBooks()
results = ab.getPriceByISBN(9780062941503)
if results['success']:
    best_new = results['pricingInfoForBestNew']
    best_used = results['pricingInfoForBestUsed']
#- Best New Price
print(best_new['bestPriceInPurchaseCurrencyWithCurrencySymbol'])
US$ 21.49
#- Best Used Price
print(best_used['bestPriceInPurchaseCurrencyWithCurrencySymbol'])
US$ 24.42