SerialsSolutions Logo

XML API for Central Search

Version 1.0
27 July 2006
Table of Contents
  1. Overview
  2. Access and Authentication
  3. Operations
    1. explain
    2. searchRetrieve
    3. searchStatus
  4. Synchronous Searching
  5. Asynchronous Searching
  6. Using HTTP POST
  7. Diagnostic Messages
Overview
The purpose of this document is to outline the current XML API for Central Search.

Central Search is a federated search engine that allows users to search multiple resources using a single query. For this API, we are implementing an extension of the Search/Retrieve via URL standard protocol for Internet search queries. Readers should become familiar with the documentation of this standard before reading the rest of this document as we include here only the details of how our interface meets and extends this standard.

The extensions to the standard that we offer are necessary to accommodate the unique nature of federated searching. In particular, we offer search continuation to account for the fact that when searching across multiple databases only a subset of the matches for a particular query are retrieved from each database. We also support asynchronous as well as synchronous searching since searching across multiple databases can take some time and applications may want to know something about the status of the search, or retrieve a subset of the results that have been gathered, before it has completed.

Access and Authentication
Access and Query Types
The base URL for this operation is
    http://<client identifier>.cs.xml.serialssolutions.com/sru
Both
HTTP POST and GET requests are supported.
Authorization and Authentication
The Central Search API requires authentication. We currently support only IP authentication. Users must register the IP(s) of the machine(s) that will be accessing the API with Serials Solutions.
Usage Restrictions
Some restrictions on usage will be imposed. The exact specification of these has yet to be determined.
Operations
There are three different operations that will be supported by our API:
The first two are specified in the SRU/SRW standard; the last is an operation specific to Central Search and available for doing asynchronous requests for search information. For each of these operations, we describe the general purpose of the operation, the request parameters and the format of the response.
The explain operation
Purpose
The
explain operation will give details about the service provided by the API, including the indexes and schemas supported for querying.
Request Parameters
Parameter Name Mandatory
or
Optional
Values Default Description
version mandatory 1.1 Not applicable The SRU version.
recordPacking optional xml xml How the explain record should be escaped in the response. Currently only an xml response is supported.
stylesheet optional valid URLs none A URL for an xml stylesheet to be included in the response.
operation mandatory explain Not applicable Indicates the type of operation being requested.
Response
The response record will be in XML following the ZeeRex Schema for SRU.
The searchRetrieve operation
Purpose
The
searchRetrieve operation is used for retrieving a set of records associated with a particular query. If this operation is used to initiate a search (synchronous searching), no response will be returned until the search has completed for all databases requested in the query.
Standard Request Parameters
Parameter Name Supported? Mandatory
or
Optional
Values Default Description
version yes mandatory 1.1 Not applicable the version of the request, and a statement by the client that it wants the response to conform to a version less than or preferabley equal to this version.
query yes mandatory A CQL query. Not applicable Contains the query expressed in CQL to be processed by the server.
startRecord yes optional integer greater than 0 1 The position within the sequence of matched records of the first record to be returned. The first position in the sequence is 1. If the value given is larger than the current number of results associated with the result set, an appropriate diagnostic will be returned. Note: This parameter is only used in searchRetrieve operations.
maximumRecords yes optional integer greater than or equal to 0 20 Indicates the maximum number of records to be returned. Note: This parameter is only used in searchRetrieve operations.
recordPacking yes optional xml xml Indicates how the record should be escaped in the response.
recordSchema yes optional cs cs See the explanation from the standard.
recordSetTTL no
sortKeys limited optional See below. date Contains a sort key to be applied to the results. See sorting below.
stylesheet yes optional A URL for an XML stylesheet none The client requests that the server simply return this URL in the response. See the description in the standard.
operation yes mandatory searchRetrieve Not applicable Indicates the type of operation being requested.
Extra Request Data
The following table outlines the
extra request parameters implemented by the Central Search XML API.
Parameter Name Mandatory
or
Optional
Values Default Description
x-cs-action optional continue none This parameter should be used to continue searching over a set of databases where a previous search left off. This must be used in conjunction with a cql.resultSetId= query to specify the previous result set. This query will result in creation of a new resultSetId for the expanded set of results. If a continue request is sent using an identifier for a result set for which no more records can be retrieved an appropriate diagnostic will be returned.
x-cs-databases optional Comma-separated list of Serials Solutions identifiers for the databases to be searched. Not applicable Though this parameter is not required, at least one of x-cs-databases, x-cs-categories, and x-cs-groups must be present in the query that starts a search. More than one of these parmeters is possible. If used in conjunction with a x-cs-action=continue parameter, this indicates that the search should be continued only for the given databases.
x-cs-categories optional Comma-separated list of Serials Solutions identifiers for the categories to be searched. Not applicable Though this parameter is not required, at least one of x-cs-databases, x-cs-categories, and x-cs-groups must be present in the query that starts a search. More than one of these parameters is possible. If used in conjunction with a x-cs-action=continue parameter, this indicates that the search should be continued only for the given categories.
x-cs-groups optional comma-separated list of Serials Solutions identifiers for the category groups to be searched. Not applicable Though this parameter is not required, at least one of x-cs-databases, x-cs-categories, and x-cs-groups must be present in the query that starts a search. More than one of these parameters is possible. If used in conjunction with a x-cs-action=continue parameter, this indicates that the search should be continued only for the given category groups.
Common Query Language Support
The Common Query Language (
CQL) is a formal language for representing queries to information retrieval systems. We currently support only Level 1 conformance to the standard.

A query consists of one or more search clauses joined by boolean operators. The service currently supports:

For unsupported queries, an appropriate diagnostic will be returned. The following table provides more details about the specific parts of the language that are supported.
Query Feature Values Default Description
Index Name
  • cs.abstract
  • cs.anyField
  • cs.author
  • cs.date
  • cs.fullText
  • cs.isbn
  • cs.issn
  • cs.keyword
  • cs.subject
  • cs.title
  • cql.resultSetId
Defined by configuration parameter for each Serials Solutions client independently. The index is used to specify what field to search over. The qualifier cs need not be specified as it is the default assumed by the server. When the cql.resultSetId index is used, it should be the only index in the query.
Relation
  • any
  • all
  • =
all any means find any of the terms; all means find all of them; = means find the phrase given within the field indicated by the index.
Boolean Operators
  • and
  • or
  • not
Not applicable These have their usual meaning, except that not means "and not" and is thus a binary operator, not a unary one.
All queries are case-insensitive. Relational modifiers and boolean modifiers are not currently supported. Search terms must be enclosed in double quotes when they include any of these characters: <, >, =, /, (, ), and whitespace. (See the
CQL documentation.)
Sorting
Sorting of result sets can be requested using the sortKeys parameter. This parameter can be supplied either as part of the original request or as a subsequent request in which the result set identifier is supplied in the query parameter in this way: query=cql.resultSetId=[id]. The sortKeys parameter should have a single value that is one of the following: The received sort key will cause citations to be sorted by the order in which they were received. Using this sort in conjunction with x-cs-action=continue and the startRecord parameter allows for easy retrieval of only those records that have been retrieved in response to the continue action. ??? See example ??? We currently do not support using multiple sort keys, multiple result sets, case sensitivity, or specification of a sort key via an XPath expression. The appropriate
diagnostic message will be returned for unsupported or invalid queries.
Response
The
response will conform to the searchRetrieveResponseType defined within SRU/SRW schema. The recordData element of the records returned will contain two types of records, one type containing the search summary information and the other containing the citation metadata. The citation metadata, outlined in the table below, will be represented using Dublin Core Metadata Elements to the extent that that is possible. Metadata that has no corresponding term in this schema will be represented using terms defined by Serials Solutions, which may be refinements of the Dublin Core terms.
Metadata Element Description
Author dc:creator An author (or creator) of the document.
cs:normalizedData/dc:creator The first author of the document, normalized to be in the format Last First Middle, if possible. Refinement of dc:creator.
Abstract dc:abstract The abstract describing a particular citation.
Call number cs:callNumber The call number of the item if it came from an OPAC. Refinement of dc:identifier
Content provider id cs:providerId The unique Serials Solutions identifier for the content provider from which a citation was retrieved. Refinement of dc:identifer.
Content provider name cs:providerName The (possibly customized) name of the content provider from which a citation was retrieved.
Database id cs:databaseId The unique Serials Solutions identifier for the database from which a citation was retrieved. Refinement of dc:identifer.
Database name cs:databaseName The (possibly customized) name of the database from which a citation was retrieved.
Date published dc:issued The publication date as presented from the original source.
cs:normalizedData/dc:issued The publication date normalized to YYYY-MM-DD, if possible. Refinement of dc:issued
Document Id cs:docId The source-specific unique identifier for a given citation (e.g., an accession number). Refinement of dc:identifer.
Document URL cs:url A URL that gives access to the document. Two types of document URLs may be provided: one to the content provider (no type attribute specified) and one to a link resolver (type="linkresolver").
Duplicate identifier cs:duplicateId Identifier of the "duplicate group" this citation belongs to. If there are no duplicates this element will not be present. Refinement of dc:identifier.
Full-text available flag cs:fullTextAvailable A Boolean flag indication whether or not a particular citation is available in full text from a particular source. Possible values are "yes" and "no". Note that a value of "no" indicates only that we have no clear indication that full text for the document is available.
Identifier dc:identifier Unique identifier within the result set for the citation.
ISBN cs:isbn The publication ISBN. Refinment of dc:identifier
ISSN cs:issn The publication ISSN, displayed with a hyphen. dc:identifier
Issue cs:issue The publication issue of the citation.
Pages cs:pages The page range, if available.
Peer-reviewed flag cs:peerReviewed A Boolean flag indicating whether or not a citation is for a document published in a peer-reviewed publication. Note that a value of "no" indicates only that we have no indication that the document has been peer reviewed.
Publication dc:source The publication name.
Publication Type dc:type The type of publication (e.g. article, book, etc.), as provided by the source.
Start page cs:spage The starting page for the document.
Title dc:title The article/document title
Volume cs:volume The publication volume, if available

The schema definitions describe the XML structure and target namespaces for the response. A small example illustrates what can be expected. Notice that not all data fields are present in every record.

The searchStatus operation
Purpose
The searchStatus operation is available for initiating asynchronous search queries and retrieving status information about an ongoing search. It allows the client to retrieve the current summary of the search in terms of the number of results retrieved thus far. For this operation, the client should initially send a request using the same parameters that would be used for a searchRetrieve request. A response containing the initial status of the search and the identifer for the result set being constructed will be returned. Queries to obtain status updates can be made using this returned identifier in a cql.resultSetId query parameter. See the section on
asynchronous searching for further explanation.
Request Parameters
The request parameters for this operation are the same as for searchRetrieve, with the obvious exception that the operation parameter value must be searchStatus.
Parameter Name Supported? Mandatory
or
Optional
Values Default Description
operation yes mandatory searchStatus Not applicable Indicates the type of operation being requested.
The first request should include the search terms and indexes in the CQL query. Subsequent queries to get updates on the status of the search started with the initial request should use cql.resultSetId= .
Response
The response to a search summary request will contain metadata about the current state of the search. The data are represented in a heirarchy that reflects the relationship between providers and databases. Currently, there are three levels to this heirarchy: summary, provider, and database.

Each level in the heirarchy is represented by a cs.searchProfile element. A cs.searchProfile element contains a partial and total count as well as a search state and a cs.searchProfile child element for each of its children in the heirarchy. The counts are sums of the corresponding counts in the child cs.searchProfile elements. The state is a summary of the child states.

The schema definitions describe the XML structure and target namespaces for the response. See the example XML for an example with two providers, one with a single database and one with two databases. A second example illustrates the same profile before all searching completed.

The table below outlines the metadata returned.

Metadata Element Attributes Parent Element Description
Search Profile cs:searchProfile
  • id
  • name
none or cs.searchProfile Container for the search status information for the entity identfied by id
Current Record Count cs:citationCount type=partial cs.searchProfile The number of records that have been retrieved from this source so far.
Total Record Count cs:citationCount type=total cs.searchProfile The total number of records that match the query for this source
Search State cs:searchState none cs.searchProfile Text representation of the state of the search. The current possible values here are:
  • query not supported - the query submitted is not supported by this source
  • authentication failed - the user could not be authenticated to the source
  • uninitialized - the search has not yet been initizlied (rare)
  • initializing - the search is in the process of being initialized
  • initialized - search has been initialized but not yet started
  • searching - sending request to source and waiting for response
  • collecting - gathering up the data from the source (rare)
  • initial connection timed out - the connection to the source could not be established or did not return results in the time allotted
  • connection timed out - the connection to the source could not be established or did not return results in the time allotted
  • completed - all results for the source for this query have been retrieved
  • error - unclassified error occured while searching
  • too many results - the search matched too many results so none are available
  • maximum users - there are too many users on the third-party system so the search could not be done at this time
  • idle - no searching is going on. We have retrieved all results we are going to in this iteration.
Synchronous Searching
Initiating a search
Synchronous searching should be used in a case when the application accessing the API wants to wait until a search has completed before doing any processing of these results. This is achieved through a query like the following:
    ?version=1.1&query=title+any+tree&x-cs-databases=AFU,NN3,PAG&operation=searchRetrieve
This initiates a search against the three databases specified and retrieves the total number of matches for the query "title contains tree" for each database as well as a subset of those matching records for each database. The
response contains the resultSetId for this set (in this example, result_set_id_string), the status information about the search, and the records retrieved.
Continuing a search
If more records than were retrieved with the initial search are wanted, a client may request that more be retrieved by using the x-cs-action=continue parameter in conjunction with cql.resultSetId:
   ?version=1.1&operation=searchRetrieve&query=cql.resultSetId=result_set_id_string&x-cs-action=continue
After some time, a response will be returned that contains a new resultSetId and an augmented set of records.
Asynchronous Searching
Initiating a search
In asynchronous searching, a client initiates a search with the non-blocking searchStatus operation with a parameter string such as this:
    ?version=1.1&query=title+any+tree&x-cs-databases=AFU,NN3,PAG&operation=searchStatus
The server will respond with a
status document, which contains the result set identifer for the result set being built for the query (in this case, asych_result_set_id). After a short pause, the client can send another searchStatus request, this time using the resultSetId returned in the original response as part of the query:
    ?version=1.1&query=cql.resultSetId=asych_result_set_id&operation=searchStatus
If the status returned indicates that the search has completed (In this case, indicated by the summary cs:searchState being "idle".), a subsequent searchRetrieve query using the original resultSetId:
    ?version=1.1&query=cql.resultSetId=asych_result_set_id&operation=searchRetrieve
will return the entire result set. This returns an updated status for the search. Notice that a new resultSetId is returned with this document. This is the identifier associated with this particular subset of the entire result set. A subsequent searchRetrieve query using this new resutlSetId:
    ?version=1.1&query=cql.resultSetId=NEW_asych_result_set_id&operation=searchRetrieve
will retrieve the same set of results (provided it is still available on the server). A subsequent status query using the original resultSetId will still return status information about the original search:
    ?version=1.1&query=cql.resultSetId=asych_result_set_id&operation=searchStatus
Continuing a search
If the initial set of results returned in response to a query is not sufficient to satisfy a user's needs, a client may ask that the more records be retrieved from the original resources by using x-cs-action=continue in a request together with a resultSetId of a previously completed search. Continuing the example from above, the query would look like this:
   ?version=1.1&operation=searchStatus&query=cql.resultSetId=ascyh_result_set_id&x-cs-action=continue
The response indicates that more searching is going on to augment the existing set of records. Notice that a new result set identifier is returned with this response since more results are being added to the original set. This new identfier is the one that should be used when polling for the status of the continued search:
    ?version=1.1&query=cql.resultSetId=CONTINUE_ascyh_result_set_id&operation=searchStatus
If x-cs-action=continue is used in conjunction with a result set identfier already associated with an ongoing search, the returned status will be for the given result set identifier. If used in conjunction with an identifier for a partial result set (e.g., the NEW_result_set_id_string from above), the continue operation will result in a status for the largest full result set associated with the same query and dataases (e.g., result_set_id_string).
Using HTTP POST
When issuing a
searchStatus or searchRetrieve operation in reference to a previous query via HTTP POST, the value of the resultSetId must be supplied as a sessionId parameter in the URI. This is in addition to supplying the same value as cql.resultSetId in the query parameter in the body.

Example:
  curl -d "version=1.1&query=cql.resultSetId=result_set_id_string&operation=searchRetrieve" http://<client identifier>.cs.xml.serialssolutions.com/sru?sessionId=result_set_id_string
Diagnostic Messages
Diagnostic messages will be returned when, for example, the query provided is not of the proper syntax, or the application's server is not responding, or the version requested is not supported. The following subset of the
SRU standard dignostics can be returned:
General system error
System temporarily unavailable
Authentication error
Unsupported operation
Unsupported version
Unsupported parameter value
Mandatory parameter not supplied
Unsupported Parameter
Query syntax error
Result set does not exist
First record position out of range
Unsupported path for sort
©2006 Copyright Serials Solutions. All rights reserved.