Grouper: RSS manager, XML converter, website scraper
Web This Site

Grouper - Documentation


Getting Started: Free Download | Purchase | Install
Reference: Functions | Plugins | Themes
Etc.: Configure | Affiliates

"Regular Expression" Based Scraper Plugin

The Regex plugin uses Perl-style regular expression matching to parse regularly structured web pages and convert them to RSS feeds. This plugin is intended for use by persons familiar with regular expression matching. Due to the complexity of analyzing the structure of web pages and constructing regular expressions to extract data from them, we are unable to provide support for configuring this plugin for particular webpages.

Installation:
To use the Regex plugin, regex.php must be located in the "plugins" folder inside the folder containing grouper.php. This is the default location when Grouper Evolution is installed.

Use:
The following code will generate an RSS feed from a webpage:

<?php
require_once '/YOUR/PATH/TO/grouper/grouper.php';
GrouperLoadPlugin('regex.php');
GrouperSourceURL('http://example.com/foo/');
// additional configuration usually needed here, as described below
GrouperShow('','CACHE-FILE-NAME');
?>

Configuration:
You may configure the behavior of the Regex plugin using the function GrouperSourceConf, as follows:

GrouperSourceConf('OptionName','new value');

The Regex plugin has the following options: