# Middlewares

NB: The middlewares have their own dedicated repository (opens new window).

# What is a middleware ?

Middlewares are functions that constitute the processing chain. The middlewares are successively applied to the consultation events processed by ezPAARSE and turnthem into the definitive form they will have when the events are eventually written in the result file.

# How to load a middleware ?

To become part of the processing chain, a middleware must have its name (ie the name of its file, without the .js suffix) added to the EZPAARSE_MIDDLEWARES array in the config file. The order of declaration in the array determines the order in which middlewares are called.

# Middlewares loaded by default

1. filter 2. parser 3. deduplicator
4. istex 5. crossref 6. sudoc
7. hal 8. enhancer 9. geolocalizer
10. cut 11. on-campus-counter 12. qualifier
13. anonymizer

# How to create a middleware ?

# Specifications

Each middleware must have its own directory, with index.js as entrypoint, and must export a function that will serve as initiator. The initiator function must return either the actual processing function, or a promise that will then return it. In case of failure during the initialization, returning an Error object (or rejecting the promise) will abort the job. The error object should be extended with a status property that specify the status code to send back (defaults to 500), and optionally a code property for the ezPAARSE-specific status (inserted in the header ezPAARSE-Status). The error message will be inserted in the header ezPAARSE-Status-Message.

The processing function takes the EC as first argument and a function to call when the EC should go on to the next middleware. Calling this function with an error will result in the EC being rejected. When there's no line left to read, the function will be called with null as EC.

The initiator and the processing function have the following properties in their context (this) :

  • request: the request stream.
  • response: the response stream.
  • job: the job object.
  • logger: a winston instance (shorthand for job.logger).
  • report: a report object (shorthand for job.report).
  • saturate: a function to call when the middleware is saturated.
  • drain: a function to call when the middleware is not saturated anymore.

# Example

# Article counter

Here is an example of a very simple middleware that counts the articles and put the total in the report as General -> nb-articles :

module.exports = function articleCounter() {
  this.logger?.verbose('Initializing article counter');

  this.report.set('general', 'nb-articles', 0);

  // Processing function
  return function count(ec, next) {
    if (!ec) { return next(); }

    if (ec.rtype === 'ARTICLE') {
      this.report.inc('general', 'nb-articles');


# Mandatory field

This middleware is a bit more advanced. It's activated by giving a field name in the Mandatory-Field header and it filters any EC that doesn't have a value for this field. An error is thrown on startup if the header contains a space.

module.exports = function mandatoryField() {
  this.logger?.verbose('Initializing mandatory field');

  let mandatoryField = req.header('Mandatory-Field');

  if (mandatoryField && mandatoryField.includes(' ')) {
    let err = new Error('space not allowed in mandatory field');
    err.status = 400;
    return err;

   * Actual processing function
   * @param  {Object}   ec   the EC to process, null if no EC left
   * @param  {Function} next the function to call when we are done with the given EC
  return function process(ec, next) {
    if (!mandatoryField || !ec) { return next(); }

    if (ec[mandatoryField]) {
    } else {
      next(new Error(mandatoryField + ' is missing'));

# Use of promises

module.exports = function mandatoryField() {

  return new Promise(function (resolve, reject) {
    if ('foo') {
      return reject(new Error('initialization failed for some reason'));

    resolve(function process(ec, next) {
      // Processing function