Paperboy Collection, Processing & Presentation of Online News ~~~~~~~~ (c) Andrew Flegg 2000. Released under the Artistic Licence v1.10 (17-Sep-2000) http://www.bleb.org/software/paperboy/ INTRODUCTION ------------ Paperboy is a Java application to fetch news articles from various sites on the Web, process them and, if relevant to the user's interests, store them locally for viewing at a later date. Some of its main features include: * Proxy support, including authenticated proxies * Expandable through plugin system * Rule-based knowledge base for stories of interest * Cache a story without the surrounding "fluff" but with its images * Modular and reusable components * Portable * Suitable for use in multi-user systems * Internal and external web page viewers The system runs in one of two modes, "gather" and "display". In the first mode it operates without user interaction and collects the stories from configured web sites - in the second mode it uses a Swing GUI to display the fetched stories and allows the configuration of the application and its plugins. PACKAGE CONTENTS ---------------- A system as powerful as Paperboy cannot be just installed and run. By default, the system is designed for use by just a single user, but it is very simple to install in a central location with per-user configuration options. The package contains: CHANGELOG.txt Changes made to the system over time README.txt This file Makefile Rules for compiling and managing the project classes/ Compiled Java class files data/ Default directory for downloaded stories and user configuration files docs/ API description for writing new site plugins lib/ Plugins, icons and default configuration files org/ Java souce code paperboy UNIX shell script for starting Paperboy Single-user installation ~~~~~~~~~~~~~~~~~~~~~~~~ The following instructions assume a UNIX or UNIX-like system (such as Linux or cygwin/Win32): 1. After unpacking the tarball check the Makefile for any commands which need changing. All the system-dependent options should be at the top of the file, but weird systems may need alterations further down. 2. Compile the source, ``make code''. If you wish to recompile the plugins or produce the documentation you may wish to just use ``make''. See below for all the options supported by the Makefile. 3. Start the system in GUI mode and configure the plugins: ``./paperboy -d'' Once the GUI has started the proxy configuration and rules setup can be changed using the "View.Options" and "View.Plugins" menu items. Multi-user installation ~~~~~~~~~~~~~~~~~~~~~~~ Similarly to above, but once compiled the "classes" and "lib" directories need to be copied to a publically readable place, such as /usr/local/paperboy. The "paperboy" script also need changing: * The LIBDIR and CLASSDIR variables need changing to the location of the central installation, eg. /usr/local/paperboy/lib and /usr/local/paperboy/classes respectively. * The DATADIR directory needs setting on a per-user basis. For example, the following code will create a ".paperboy" directory in the user's home directory if one does not exist: --------8<-------- DATADIR=$HOME/.paperboy if [ ! -e $DATADIR ]; then echo "Creating .paperboy directory for first time use..." mkdir $DATADIR mkdir $DATADIR/images fi -------->8-------- In a future version, the multi-user installation above may be automated and the default. USAGE ----- Once installed and configured the system will start a fetch when "File.Get news..." is selected from the GUI or, using the recommended method, ``paperboy --gather''. The stories which match the rules for each plugin will be locally stored in DATADIR along with any images that page may need. The next time ``paperboy --display'' is called the unread stories will be shown. Each one can be selected individually or the stories can be stepped through using the button bar icons, the menu or the keyboard shortcuts "n" and "p" (next and previous respectively). Stories can be deleted from the cache or the original story viewed in the configured web browser. DEVELOPMENT ----------- Paperboy has a lot of work yet to be done on it, some of this is described below: Plugins ~~~~~~~ Currently there are two useful plugins, one for BBC Online (http://news.bbc.co.uk/) and the other for Wired News (http://www.wired.com/news/). Further plugins need to be written and these can be used as a starting point. In addition to news stories, stock quotes, comic strips or even search engine results could be stored - the possibilities are practically endless. The plugins are also fairly dependent on the site's HTML, if it changes too much the plugins will be unable to cope and need to be changed. Fortunately this doesn't happen too often, but a method of handling these changes would be useful. Room for improvement (aka bugs) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following is a list of known-shortcomings which need resolving: * Deleted stories can get refetched on the next run * The Swing HTML renderer is, at best, crap Makefile ~~~~~~~~ The make file supports the following targets: all (default) Recompiles the code, the plugins and the documentation and ensures the permissions are correctly set. code Just recompiles the code, this can be further broken down into: code.root code.web code.util code.plugin code.rule code.gui (which includes code.gui.root and code.gui.htmlview) tidy Checks files for DOS line endings and tabs in the source code and corrects as necessary (requires "fixfile"). clean Removes the source code and API documentation and any cached data (clean.data). test Runs the internal tests. plugins Recompile the plugins in the "lib" directory. javadoc Produces the API description under the "docs" directory. perms Ensures that all the files have appropriate permissions. AUTHOR & LICENCING ------------------ Paperboy is released under the Artistic Licence, see: http://www.opensource.org/licenses/artistic-license.html The author, Andrew Flegg, is also the copyright holder and makes no warranty, EITHER EXPRESS OR IMPLIED about the suitability of this software for any purpose. Paperboy may be distributed freely, as long as all copyright messages remain intact and under the further terms of the Artistic Licence.