Grouper: RSS manager, XML converter, website scraper
Web This Site

Grouper - Documentation


Getting Started: Free Download | Purchase | Install
Reference: Functions | Plugins | Themes
Etc.: Configure | Affiliates

Unstructured Scraper Plugin

The Unstructured plugin finds all of the links on a webpage and uses them and the text that follows them to construct an RSS feed. The plugin can be configured to skip links that you don't want included in the feed. The result tends to be a fairly "quick and dirty" feed unless the webpage is well suited to the way the plugin works and/or appropriate configuration is done. Still, for some purposes, the simplicity of this plugin is useful.

Installation:
To use the Unstructured plugin, unstructured.php must be located in the "plugins" folder inside the folder containing grouper.php. This is the default location when Grouper Evolution is installed.

Use:
The following code will generate an RSS feed from a webpage:

<?php
require_once '/YOUR/PATH/TO/grouper/grouper.php';
GrouperLoadPlugin('unstructured.php');
GrouperSourceURL('http://example.com/foo/');
// additional configuration usually needed here, as described below
GrouperShow('','CACHE-FILE-NAME');
?>

Configuration:
The Unstructured plugin provides the function UnstructuredGrouperAddSkip, which is used to configure parts of the document to omit from the feed. The function has two arguments, "$type" and "$data". $type indicates how the value of $data is to be used, and can have the following values: You may set the remainder of the configuration options for the Unstructured plugin using the function GrouperSourceConf, as follows:

GrouperSourceConf('OptionName','new value');

The Regex plugin has the following options: