Introduction
Newspapers The Guardian and The Observer are available digitally via Newspaper Direct in an "as printed" form. Although this isn't very useful for on-line reading, it would be a good format for an e-reader of some description. Ideally, it would be possible to automatically download the paper and synchronise a reader overnight and have the latest paper ready to go in the morning.
Unfortunately, Newspaper Direct don't make it easy to get the latest copy in PDF or ebook format. You have to connect to the site and download each section manually which pretty much kills the idea of a daily e-reader load.
Guardian Grab interacts with the Newspaper Direct site to log-on, identify available sections and download in PDF, mobi (Kindle) or ePub formats. Downloads are arranged by paper, date and format under a specified directory. Guardian Grab also maintains a directory holding the latest copy of each paper, for easy syncing.
NB: You will need a Newspaper Direct subscription to use this software.
Guardian Grab is an enhancement of work done by Ladislav Snizek in guardianpdf.
Install
Guardian Grab requires Perl and the following modules to be installed. Some are likely to already be present in your Perl distribution. For Windows I use ActivePerl, but <bbc>other distributions are available</bbc>.
- Date::Format (aka TimeDate)
- Date::Parse (aka TimeDate)
- Digest::MD5
- File::Copy
- File::Copy::Recursive
- File::HomeDir
- File::Path
- File::Spec
- HTTP::Cookies
- Log::Log4perl
- URI::Escape
- WWW::Mechanize
Existing user? See the changelog.
UNIX
For BSD/Linux/other UNIX-like systems:
- Download guardiangrab-2.0.tar.gz
- Unpack
- make install
- Create configuration file named .guardiangrab in your home directory
Windows
For Microsoft Windows systems:
- Download guardiangrab-2.0.zip
- Unpack
- Place guardiangrab.pl somewhere of your choice
-
Create configuration file named guardiangrab.ini in your AppData directory
- Under Windows 7 this is c:\users\name\AppData
- If in doubt just run guardiangrab.pl and the output will tell you where it's looking
Configuration
Your configuration file should look like:
# Your Newspaper Direct login ID login=myself@me.com # Your Newspaper Direct password password= # Base directory for stored files destdir=/media/paper # Base directory for stored files (Windows) ;destdir=c:\users\name\Documents\Paper # Formats to retrieve format=pdf format=kindle # Publication details <Publication guardian> domain=guardian signinHost=users.guardian.co.uk </Publication> |
The format parameter can either pdf or one of the ebook formats, currently: cooler, edge, irex, kindle, kindledx, libra, nook, nuut, sonyreader. Specify format multiple times to download in multiple formats.
Using
Just run guardiangrab or guardiangrab.pl.
And finally...
TODO
These are just ideas, they may or may not happen.
- Support other papers on Newspaper Direct
Feedback
Feedback always appreciated, even if just to say you found this useful.
