Grouper - Documentation
The XML plugin converts XML documents to RSS.
It requires additional configuration to define the mapping between the other XML format and RSS,
and if the original data is to be altered, rather than simply copied into RSS elements as is,
requires additional code to perform the alterations.
Helper plugins are provided to handle the configuration and extra processing needed for such formats as
Atom 0.3 and 1.0, Amazon.com's associates XML, etc.
To use the XML plugin, xml.php and any helper plugins must be located in the "plugins" folder inside the folder containing grouper.php.
This is their default location when Grouper Evolution is installed.
Loading the plugins:
To load the plugins, use code like the following.
Note that the XML plugin must be loaded before any helper plugins are loaded.
Note: if you are using Grouper version 1.4.2 or earlier, you must replace the call to GrouperSourceURL with something like the following:
The behavior of the XML plugin is configured using the function GrouperSourceConf, as follows:
The XML plugin has the following configuration options.
You will rarely, if ever, need to change any of the options other than perhaps "mustparsedate":
1 if you wish to ensure that Grouper only outputs dates in a valid format.
0 to allow Grouper to pass any value through that it finds in the XML file, even if it is unable to convert it to a proper date.
- encoding: [Grouper <= 1.6.1]
The character encoding in which you wish Grouper to output the RSS feed.
Valid value are UTF-8, ISO-8859-1, and US-ASCII.
- encoding-priority: [Grouper <= 1.6.1]
This setting is an array indicating the order of priorities in which to select the input document encoding.
By default, the encoding specified by the document is used, if any,
followed by the encoding indicated in the HTTP headers if any,
and finally, the value of $xmlgrouperconf['encodingin']['conf'] is used.
Note that internet standards specify that the HTTP header value should take precedence.
However, in practice, the document may be correct more often.
If you encounter a feed where the document specifies the encoding incorrectly, you will need to adjust the order of precedence here.
To set this option, use code like this:
- encodingin: [Grouper <= 1.6.1]
This setting is an array with one member, indexed by the key 'conf' (ie. $xmlgrouperconf['encodingin']['conf']).
The value indicates the configuration-specified default encoding to assume if the encoding cannot be determined some other way.
The XML plugin will add more values to this array during processing.
To set this option, use code like this:
1 to have the XML plugin automatically do some necessary work to avoid memory leaks when it finishes processing.
0 to do it manually, allowing you access to the parsed XML data tree.
Documentation for how to do this is not yet available.
1 to keep a cached copy of elements that have been rebuilt after being parsed.
0 to save memory by not doing so.
Unless you wish to access the data manually after the plugin has finished processing, there is probably no need to do this.
- searchdomain: [Grouper < 1.6]
The domain name of the server from which to load the XML file (for example, 'antone.geckotribe.com').
Use the function GrouperSourceURL to set this option and querystart at the same time.
- querystart: [Grouper < 1.6]
The path to the XML file.
This value MUST begin with '/'.
Use the function GrouperSourceURL to set this option and searchdomain at the same time.
Helper Plugin Developer Technical Information
Helper plugins must set certain configuration options in the XML plugin to define the mapping from the source XML format to RSS,
and may also provide additional functions to process the data.
We recommend using one of the helper plugins included with Grouper Evolution as a starting point and model when developing your own helper plugins.
The XML to RSS mapping is defined by the "element-map" configuration setting.
This setting is a nested array with the following structure.
Note that pretty much everything is optional under most circumstances.
If in doubt of whether you can omit something, try it out and see if it works:
Paths to elements and attributes
- 'base' =>'path/to/the/element/containing/the/channel/data', // see notes on path formats below
- '<RSS element name>'=>array( // repeat this for each RSS element
- 'path/to/second/choice/source/element/or/@attribute' //etc. -- see notes below
'val'=>'static text to use as the value for this element',
- 'required'=>1, // if an element is marked required, but it's "src" cannot be found, it's parent element will not be created
- 'is'=>'>one of: link, date, html-allowed<', // this value triggers additional processing
- 'process'=>'name of function to use to perform additional processing on this element', // see notes below for the function prototype
- 'process-children'=>'name of function to process any child elements of this element after they have all been built', // see notes below for the function prototype
- 'attribute name'=>array(
- 'src' or 'val'... // the same as with elements
- 'is'=>'<link or date>', // note: not "html-allowed"
- 'process'... // the same as with elements
- 'required'... // the same as with elements
// mappings for any child elements, with the same structure as for ['channel']['child']
- 'base'... // as with 'channel'
- 'child'... // as with 'channel'
Paths to source elements and attributes are specified using a subset of XPath syntax.
Process and Process-Children Callback Functions
- 'base' (as a child of either 'channel' or 'item') must be the path to an element.
It must begin from the document root element, but must not start with a "/".
- 'src' can be a path to either an element or an attribute.
It must begin from the 'base' of the enclosing 'channel' or 'item', and need not start with a "/".
- Attributes are indicated by the prefix "@".
- An element with an attribute with a specific value may be selected by indicating the attribute name and value like this:
The preceding example points to the contents of a link with the value "alternate" in its "rel" attribute.
The value is case sensitive and must be an exact match.
The following example points to the "href" attribute of the same element:
- An element with a child element with a specific value may be similarly indicated like this:
The preceding points to the contents of the "asdf" element which is a child of the "foo" element whose "bar" child element has the value "qwerty".
The value is case sensitive and must be an exact match, except that leading and trailing space is trimmed from the actual element value before comparison.
- When more than one element with the same name exists, one may be selected using a numeric index like this:
The preceding indicates the third element named "foo".
- Some of the above may be combined.
For example, the following selects the value of an "asdf" which is the child of a "bar" element which is the child of a "foo" element
where the "bar" element's first child element named "foobar" has a "rel" attribute whose value is "qwerty":
- Some of the above may not be combined.
For example, this syntax, where two predicates are combined (the value of the first "foo" element if it's "bar" attribute's value is "qwerty"),
is not supported:
- When multiple elements are found that meet the entire specified criteria, the first one will be used.
The prototype for "process" and "process-children" functions is:
function FunctionName($element_name, &$attributes, $source_data)
$element_name is the name of the RSS element being created.
$attributes is an array containing the attributes of the source element as name/value pairs.
For "process" functions, $source_data is the contents of the source element.
It does not include any child elements defined in the element map, but does include any child elements from the source data.
For "process-children" functions, $source_data is the fully constructed child elements.
The return value of the "process" and "process-children" functions will take the place of $source_data in the content of the RSS element that is being created.
The "inline-elements" setting lists any elements whose content is unescaped XHTML or HTML (which must be well-formed XML).
These elements and their children are not parsed (technically, they're immediately reconstituted) during the XML parsing phase.
List the full XPath of each element as an array key with the value 1, like this: