Z3950 Harvesting

Z3950 is a remote search and harvesting protocol that is commonly used to permit search and harvest of metadata. Although the protocol is often used for library catalogs, significant geospatial metadata catalogs can also be searched using Z3950 (eg. the metadata collections of the Australian Government agencies that participate in the Australian Spatial Data Directory - ASDD). This harvester allows the user to specify a Z3950 query and retrieve metadata records from one or more Z3950 servers.

Adding a Z3950 Harvester

The available options are:

  • Site

    • Name - A short description of this Z3950 harvester. It will be shown in the harvesting main page using this name.

    • Z3950 Server(s) - These are the Z3950 servers that will be searched. You can select one or more of these servers.

    • Z3950 Query - Specify the Z3950 query to use when searching the selected Z3950 servers. At present this field is known to support the Prefix Query Format (also known as Prefix Query Notation) which is described at this URL: http://www.indexdata.com/yaz/doc/tools.html#PQF. See below for more information and some simple examples.

    • Icon - An icon to assign to harvested metadata. The icon will be used when showing search results.

  • Options - Scheduling options.

  • Harvested Content

    • Apply this XSLT to harvested records - Choose an XSLT here that will convert harvested records to a different format.

    • Validate - If checked, records that do not/cannot be validated will be rejected.

  • Privileges

  • Categories

Note

this harvester automatically creates a new Category named after each of the Z3950 servers that return records. Records that are returned by a server are assigned to the category named after that server.

More about PQF Z3950 Queries

PQF is a rather arcane query language. It is based around the idea of attributes and attribute sets. The most common attribute set used for geospatial metadata in Z3950 servers is the GEO attribute set (which is an extension of the BIB-1 and GILS attribute sets - see http://www.fgdc.gov/standards/projects/GeoProfile). So all PQF queries to geospatial metadata Z3950 servers should start off with @attrset geo.

The most useful attribute types in the GEO attribute set are as follows:

@attr number

Meaning

Description

1

Use

What field to search

2

Relation

How to compare the term specified

4

Structure

What type is the term? eg. date, numeric, phrase

5

Truncation

How to truncate eg. right

In GeoNetwork the numeric values that can be specified for @attr 1 map to the lucene index field names as follows:

@attr 1=

Lucene index field

ISO19139 element

1016

any

All text from all metadata elements

4

title, altTitle

gmd:identificationInfo//gmd:citation//gmd:title/gco:CharacterString

62

abstract

gmd:identificationInfo//gmd:abstract/gco:CharacterString

1012

_changeDate

Not a metadata element (maintained by GeoNetwork)

30

createDate

gmd:MD_Metadata/gmd:dateStamp/gco:Date

31

publicationDate

gmd:identificationInfo//gmd:citation//gmd:date/gmd:CI_DateCode/@codeListValue=’publication’

2072

tempExtentBegin

gmd:identificationInfo//gmd:extent//gmd:temporalElement//gml:begin(Position)

2073

tempExtentEnd

gmd:identificationInfo//gmd:extent//gmd:temporalElement//gml:end(Position)

2012

fileId

gmd:MD_Metadata/gmd:fileIdentifier/*

12

identifier

gmd:identificationInfo//gmd:citation//gmd:identifier//gmd:code/*

21,29,2002,3121,3122

keyword

gmd:identificationInfo//gmd:keyword/*

2060

northBL,eastBL,southBL,westBL

gmd:identificationInfo//gmd:extent//gmd:EX_GeographicBoundingBox/gmd:westBoundLongitude*/gco:Decimal (etc)

Note that this is not a complete set of the mappings between Z3950 GEO attribute set and the GeoNetwork lucene index field names for ISO19139. Check out INSTALL_DIR/web/geonetwork/xml/search/z3950Server.xsl and INSTALL_DIR/web/geonetwork/xml/schemas/iso19139/index-fields.xsl for more details and annexe A of the GEO attribute set for Z3950 at http://www.fgdc.gov/standards/projects/GeoProfile/annex_a.html for more details.

Common values for the relation attribute (@attr=2):

@attr 2=

Description

1

Less than

2

Less than or equal to

3

Equals

4

Greater than or equal to

5

Greater than

6

Not equal to

7

Overlaps

8

Fully enclosed within

9

Encloses

10

Fully outside of

So a simple query to get all metadata records that have the word ‘the’ in any field would be:

@attrset geo @attr 1=1016 the

  • @attr 1=1016 means that we are doing a search on any field in the metadata record

A more sophisticated search on a bounding box might be formulated as:

@attrset geo @attr 1=2060 @attr 4=201 @attr 2=7 "-36.8262 142.6465 -44.3848 151.2598

  • @attr 1=2060 means that we are doing a bounding box search

  • @attr 4=201 means that the query contains coordinate strings

  • @attr 2=7 means that we are searching for records whose bounding box overlaps the query box specified at the end of the query

Notes

  • Z3950 servers must be configured for GeoNetwork in INSTALL_DIR/web/geonetwork/WEB-INF/classes/JZKitConfig.xml.tem

  • every time the harvester runs, it will remove previously harvested records and create new ones.