# Process API
ezPAARSE is a RESTful (opens new window) application. REST: Representational State Transfer
# Sending logs
The main route for ezPAARSE is the root of the web service. The GET method gives access to the logs submission form, and the POST method allows sending logs. The submitted files are parsed and access events are sent back, as a resulting stream.
PATH | GET | POST |
---|---|---|
/ | Submission form | Parses a log file |
# The detailed POST request
POST / HTTP/1.1
# Parameters (headers)
# Body
Log lines generated by a proxy server.
EZProxy documentation (opens new window) Squid documentation (opens new window)
# Response to a POST request
# Status code
- 200 OK: the logs have been successfully processed.
- 400 Bad Request: a request element makes the processing of logs impossible.
- 406 Not Acceptable: encoding or output format not supported.
# Headers
- Job-ID: unique identifier associated to the current processing job.
- Job-Report: URL for the detailed processing report, including all of the headers sent by ezPAARSE.
- ezPAARSE-Status: return code if an error is raised.
- ezPAARSE-Status-Message: explanation message on the return code.
Headers containing the URLs for accessing the logs :
- Job-Traces: traces for the current ezPAARSE job (the verbosity level can be modified with the Traces-Level header)
- Lines-Unknown-Formats: lines for which the format has not been recognized.
- Lines-Ignored-Domains: lines for which the domain is ignored.
- Lines-Unknown-Domains: lines for which the domain is not associated to a parser.
- Lines-Unqualified-ECs: lines that generated access events with too few information. (More details)
- Lines-PKB-Miss-ECs: lines that generated identifiers that can't be found in the PKB for the corresponding platform.
- Lines-Duplicate-ECs: lines filtered out by the double-clicks detection algorithm.
- Lines-Unordered-ECs: lines rejected because they were not chronologically ordered
- Lines-Robots-ECs: lines generated by non-human agents (robots, crawlers, spides, etc.).
- Lines-Ignored-Hosts: lines that were filtered based on their IP address.
- Lines-Unknown-Errors: lines that were rejected due to unknown errors.
# Body
CSV or JSON containing all of the generated access events.
Access event example:
{
"host": "1234567d6b8dd5dddc87939c4a407987",
"login": "IDEXEMPLE",
"date": "2011-12-31T10:42:42+01:00",
"url": "http://www.une-adresse.com/exemple.php?id=16",
"status": "200",
"size": "0",
"domain": "www.une-adresse.com",
"type": "PDF",
"issn": "1111-1111"
}
# Request examples
curl -X POST http://127.0.0.1:59599 --no-buffer --data-binary @file.log -v
curl -X POST --proxy "" --no-buffer --data-binary @test/dataset/sd.2012-11-30.log http://127.0.0.1:59599 -v
curl -X POST --proxy "" --no-buffer -H "Accept: application/json" --data-binary @test/dataset/sd.2012-11-30.log http://127.0.0.1:59599 -v
# Access the traces and rejects
When ezPAARSE is processing a request (a job), it generates informative files bound to its activity. Those can be accessed by using the unique identifier attributed to the job.
PATH | Information given |
---|---|
/{jobID}/job-traces.log | Traces of the internal process. It's only interesting when a something has gone wrong. |
/{jobID}/job-report.(json|html) | Report aggregating data on the job: how many lines were rejected, reject rate, date and job duration, etc. Use it like /{jobID}/job-report.html?standalone=1 to generate a standalone html report |
/{jobID}/lines-unknown-formats.log | Lines for which the format was not recognized because it doesn't look like the input parameters |
/{jobID}/lines-ignored-domains.log | Lines for which the domain is ignored. |
/{jobID}/lines-unknown-domains.log | Lines for which the domain is not associated to a parser. |
/{jobID}/lines-unqualified-ecs.log | Lines that generated access events with too few information. [(More details)](../features/qualification.html) |
/{jobID}/lines-pkb-miss-ecs.log | Lines that generated identifiers that can't be found in the PKB for the corresponding platform. |
/{jobID}/lines-duplicate-ecs.log | Lines filtered out by the COUNTER double-clicks detection algorithm. |
/{jobID}/lines-unordered-ecs.log | Lines rejected because they were not chronologically ordered. |
/{jobID}/lines-robots-ecs.log | Lines generated by non-human agents (robots, crawlers, spides, etc.). |
/{jobID}/lines-ignored-hosts.log | Lines that were filtered based on their IP address. |
/{jobID}/lines-unknown-errors.log | Lines that were rejected due to unknown errors. |
- jobID: unique identifier attributed to the job.
# General information
These routes are useful to get various information like: the list of platforms, the types of access events. They only respond to the GET method.
URL | Information given |
---|---|
/info/platforms | Lists the available plateforms |
/info/rid | Lists the resources identifiers |
/info/rtype | Lists the resources types |
/info/mime | Lists the resources formats (or mimetypes) |
/info/codes | Lists the application's return codes and their meaning |
/info/codes/{code} | Returns the meaning of one return code |
/info/form-predefined | lists the predefined parameters for the advanced options in the form |
/info/usage | General usage statistics |
# Administration
These routes are used to administrate ezPAARSE. For the most part, they can be used through the application's admin page. They require being authentified, except for /api/admin/register.
PATH | Méthode | Usage |
---|---|---|
/api/admin/register | POST | Creates the first account as administrator. It doesn't work if one or more users are already existing.
Parameters: username, password |
/api/admin/platforms/status | GET | Reports on the platforms' state
Returns: uptodate or outdated |
/api/admin/platforms/status | PUT | Updates the platforms
The body must contain uptodate |
/api/admin/users | GET | Returns the list of local users |
/api/admin/users/ | POST | Creates a local user
Parameters: username, password |
/api/admin/users/{username} | DELETE | Deletes a local user |