1. Introduction
This specification is an extension specification to the CLARIN-FCS Core 2.0 specification and describes an access mechanism for restricted resources.
The Federated Content Search currently does not support the restriction of access to resources to specific user groups or users. There is no mechanism in the aggregator or other FCS clients that limits access to announced resources and endpoints are typically accessible via the FCS protocol directly (omitting the use of the FCS aggregator). The goal is to allow restricting access to FCS resources to users that are authenticated using the established AAI infrastructure.
This contains the following issues:
-
Shibbolizing the FCS aggregator frontend so that
-
unauthenticated users still get access to all unrestricted endpoints
-
authenticated users get access to all unrestricted endpoints and additionally to the restricted endpoints
-
-
Specifying (and implementing) changes for FCS endpoints so that:
-
endpoints can announce restricted resources to FCS clients, e.g. the aggregator
-
endpoints can rely on authenticated FCS user requests
-
1.1. Terminology
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as in RFC2119.
1.2. Glossary
- AAI
-
Authentication & Authorization Infrastructure. A service and a procedure that enables members of different institutions to access protected information that is distributed on different web servers. See Shibboleth.
- IdP
-
Identity Provider, a system entity that issues authentication assertions, see SAML and AAI.
- JWK
-
JSON Web Key, see RFC7517.
- JWKS
- JWT
-
JSON Web Token, see RFC7519.
- RSA
-
Asymmetric public-key cryptography system by Rivest-Shamir-Adleman for digital signatures and encryption.
- SAML
-
Security Assertion Markup Language, an open standard for exchanging authentication and authorization data.
- Shibboleth
-
Shibboleth is a single sign-on log-in system for computer networks and the Internet. See also SAML.
- SP
-
Service Provider, a system entity that receives and accepts authentication assertions, see SAML and AAI.
1.3. Normative References
- RFC2119
-
Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, https://www.ietf.org/rfc/rfc2119.html
- RFC7515
-
JSON Web Signature (JWS), IETF RFC 7515, May 2015, https://www.ietf.org/rfc/rfc7515.html
- RFC7517
-
JSON Web Key (JWK), IETF RFC 7517, May 2015, https://www.ietf.org/rfc/rfc7517.html
- RFC7519
-
JSON Web Token (JWT), IETF RFC 7519, May 2015, https://www.ietf.org/rfc/rfc7519.html
- XML-Namespaces
-
Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, http://www.w3.org/TR/2009/REC-xml-names-20091208/
- LOC-DIAG
-
SRU Version 1.2: SRU Diagnostics List, Library of Congress, http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html
- CLARIN-FCS-Core 2.0
-
CLARIN Federated Content Search (CLARIN-FCS) - Core 2.0, SCCTC FCS Task-Force, May 2017, https://trac.clarin.eu/wiki/FCS/Specification, https://github.com/clarin-eric/fcs-misc/tree/main/fcs-core-2.0, https://office.clarin.eu/v/CE-2017-1046-FCS-Specification-v20230426.pdf
1.4. Non-Normative References
- RFC6838
-
Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, https://www.ietf.org/rfc/rfc6838.txt
- RFC3023
-
XML Media Types, IETF RFC 3023, January 2001, https://www.ietf.org/rfc/rfc3023.txt
- RFC8017
-
PKCS #1: RSA Cryptography Specifications Version 2.2, IETF RFC 8017, November 2016, https://www.ietf.org/rfc/rfc8017.txt
1.5. Typographic and XML Namespace conventions
The following typographic conventions for XML fragments will be used throughout this specification:
-
<prefix:Element>An XML element with the Generic Identifier Element that is bound to an XML namespace denoted by the prefix prefix.
-
@attrAn XML attribute with the name attr.
-
stringThe literal string must be used either as element content or attribute value.
Endpoints and Clients MUST adhere to the XML-Namespaces specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints MUST ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients MUST be agnostic regarding syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and SHOULD operate solely on expanded names, i.e. pairs of namespace name and local name.
2. Restricted Access to Resources
The FCS supports restriction of access to resources to specific user groups or users that are authenticated using the established AAI infrastructure while still allowing unauthenticated access to all unrestricted resources. This mechanism limits access to announced resources that are typically accessible via the FCS protocol directly (omitting the use of the FCS aggregator).
2.1. Technical Description
The aggregator allows an optional login via Shibboleth. Authentication of search queries which are sent to an FCS endpoint is implemented using authentication headers with JSON Web Token (JWT). User information (e.g. mail address or similar) is encoded in the token as claims.
JWTs include an RSA signed token which SHOULD be checked by the respective endpoint. In case of missing or insufficient authorization when accessing restricted resources, the endpoint MUST rise an error using SRU diagnostic "Authentication error" (info:srw/diagnostic/1/3). The diagnostic "Not authorised to send record" (info:srw/diagnostic/1/68) MAY be sent as an error when an authenticated request was made but additional authorization is still required, e.g. if a resource might only allow certain users access.
The aggregator owns a single common RSA key for all endpoints. For all endpoints with restricted resources its public key has to be manually transferred and included in their configuration. Communication between aggregator and endpoints is only allowed via SSL. Claims iss, sub and aud MUST be encoded in the JWT. The endpoint MUST check aud to see if it is the correct recipient and MAY check the fields iss and exp.
The availability and nature of a personal identifier attribute in the aggregator itself, is not guaranteed for a successful login. In SAML this is limited by the configuration of the external IdP selected by the user. When available, one of three SAML attributes is used as user personal identifier: eduPersonPrincipalName, eduPersonTargetedID or mail. These attributes are mapped by the aggregator to userID in the same order of preference. userID is then passed to the endpoint. The aggregator issues this information to JWTs and sends it to the appropriate endpoints. The endpoints alone announces which authentication information is required (see next section) and decides at runtime whether the information provided is sufficient for access. While the aggregator front end can inform and guide the user in advance, in case authentication information is still missing.
2.1.1. JWT Signature Algorithm
The JWT will be signed with the "RS256" algorithm (see JSON Web Token (JWT), Section 8 "Implementation Requirements"). The digital signature is created using RSASSA-PKCS1-v1_5 with the SHA-256 hash function. For this a RSA key size of 2048 bits or larger MUST be used. Endpoints and clients MUST support 2048 bit RSA keys but SHOULD be able to handle larger key sizes.
Signing of the JWT will require the private-public RSA key pair in the FCS client. FCS endpoint will only require the public RSA key for validating the JWT signature. The JWT will not be encrypted.
An example of using the RS256 algorithm can be found in the Appendix A.2 of JSON Web Signature (JWS).
2.1.2. Key Exchange
Verification of JWTs at an endpoint requires it to know the public RSA key used to generate and sign the JWT by the requesting client. Endpoints with restricted resources need to be configured to include the public key. This can be done in two ways, by manually transferring and including the public key in the configuration, or by using a public JSON Web Key Set endpoint (https://<fcs-client.url>/.well-known/jwks.json) to allow for automatic retrieval of the public key.
2.1.3. JWT Claims and Verification
The claims iss (issuer), sub (subject) and aud (audience) MUST be encoded in the JWT by the client.
-
iss
MUSTcontain a unique identifier for the client, e.g. the canoncial public URL for web applications like the FCS aggregator. -
aud
MUSTcontain the URL of the endpoint which is used to register the endpoint at the client and is also being used to for SRU/FCS requests. -
sub
MUSTcontain theuserIDif a resource has a requirement ofpersonalIdentifier. IfauthOnlythen the sub claimSHOULDbe empty and should be ignored.
The claims exp (expiration time), nbf (not before) and iat (issued at) are OPTIONAL but SHOULD be used to limit token lifetime. Values SHOULD therefore be low, e.g., 15 seconds. Clocks of endpoint and client need to be synchronized. Setting the jti (JWT ID) will also further secure communication by helping to prevent replay attacks.
The endpoint MUST check aud to see if it is the correct recipient and MAY check the fields iss and exp, nbf, iat, jti.
The sub (subject) field SHOULD be used retrieve the userID for resources with personalIdentifier requirement and to perform advanced authorization.
2.2. Announcing Restricted Resources by the Endpoint
In order to announce restricted resources the endpoint needs to explain in the <Resources> section of the <EndpointDescription> that it does support the aforementioned procedure for a resource. This is done via a dedicated element <AvailabilityRestriction>:
<Resources>
<Resource>
<!-- ... -->
<Languages>
<Language>swe</Language>
<Language>deu</Language>
</Languages>
<AvailabilityRestriction>requirementName</AvailabilityRestriction>
<!-- ... -->
</Resource>
</Resources>
The <AvailabilityRestriction> element can be defined for each <Resource> element. In case of sub-resources, each resource MUST declare restrictions independently. Restrictions in sub-resource MAY differ from their parent resource, i.e. there is no implicit inheritance semantic.
From the backwards compatibility perspective this means that if you do not define the <AvailabilityRestriction> element all resources will be available for searching through the endpoint. That will also apply to sub-resources without <AvailabilityRestriction> element while their parent <Resource> has a restriction defined.
The text node requirementName in <RestrictionRequirement> is either of authOnly or personalIdentifier.
-
authOnly: the resource does not require any attributes except for the authentication via home institution. -
personalIdentifier: extendsauthOnlyby passing auserIDusing the procedure described above to the endpoint. TheuserIDwill be one of eduPersonPrincipalName or eduPersonTargetedID or email.
Changelog
2024-12-04 — Update specification based on current prototype implementation
-
Add details about key/signature algorithms, RS256
-
Add key exchange details, using JWKS
-
Add details about JWT claims (usage, requirments)
-
Add SRU 68 dignostic
-
Change
<AvailabilityRestriction>element in<EndpointDescription> -
Change order of SAML attributes for
userIDcomputation -
Change formulation to generalize for any FCS client, not only AAI for FCS Aggregator
2023-06-12 — Conversion to AsciiDoc and Migration of specification documents to Github
-
Convert specification documents for FCS Core 2.0, Core 1.0, DataView and AAI to AsciiDoc
-
Migrate from CLARIN Trac to CLARIN Github
-
Add Github Actions workflow to automate build process