# Job reports
ezPAARSE generates an execution report, everytime it processes a log file. The various sections of this report are documented below.
- General: contains general information related to the processing
- Rejects: lists all rejects, how much they are and the links to the files containing the rejected lines
- Statistics: provides the first global figures
- Alerts: lists the active alerts
- Notifications: lists the email for the recipients of processing notifications
- Duplicates: algorithm used for deduplication
- File: list of processed log files
- First consultation: content of the first access event
# General
Job-Date | 2014-06-16T14:55:04+02:00
Processing date |
---|---|
Job-Done | true
Has the processing correctly completed? |
Job-Duration | 4 m 22 s
Processing duration |
Job-ID | 6f601540-f555-11e3-b477-758199fa5dc1
Unique Identifier for the processing |
Rejection-Rate | 96.74 %
Rejected lines rate (ie. unknown domains, duplicates,etc.) among the relevant lines |
URL-Traces | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/job-traces.log Access to the execution traces for the processing |
client-user-agent | Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/33.0.1750.152 Chrome/33.0.1750.152 Safari/537.36 |
ezPAARSE-version | ezPAARSE 2.3.0 |
geolocalization | all
Requested geo-location fields |
git-branch | master |
git-last-commit | 429e61bf29e80326b09958b0a68a01c0ae3add91 |
git-tag | 1.7.0 |
input-first-line | rate-limited-proxy-72-14-199-16.google.com - - [19/Nov/2013:00:11:05 +0100] "GET http://gate1.inist.fr:50162/login?url=http://www.nature.com/rss/feed?doi=10.1038/465529d HTTP/1.1" 302 0
First log line found in a submitted log file |
input-format-literal | %h %l %u %t "%r" %s %b (ezproxy)
Format used to identify the elements found in a log file |
input-format-regex | ^([a-zA-Z0-9\.\-]+(?:, ?[a-zA-Z0-9\.\-]+)*) ([a-zA-Z0-9\-]+|\-) ([a-zA-Z0-9@\.\-_%,=]+) \[([^\]]+)\] "[A-Z]+ ([^ ]+) [^ ]+" ([0-9]+) ([0-9]+)$
Regular expression corresponding to the given format for log lines |
nb-denied-ecs | 104
Number of denied consultation events (access to not subscribed resources) |
nb-ecs | 14224
Total number of consultation events found in the log file |
nb-lines-input | 792049
Number of log lines found in the file given as input |
on-campus-accesses | 6549
Total number of on-campus consultation events |
process-speed | 3019 lignes/s
Processing speed |
enhancement-errors | 0
Number of consultation events that could not be enriched because of MongoDB errors |
result-file-ecs | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1 URL for accessing the result file |
url-denied-ecs | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/denied-ecs.csv URL for accessing the file containing denied consultations (for non subscribed resources) |
# Rejects
nb-lines-duplicate-ecs | 1893
Number of deduplicated access events (following the COUNTER algorithm) |
---|---|
nb-lines-ignored | 351891
Number of ignored lines (not relevant) |
nb-lines-ignored-domains | 4
Number of lines for which the domain has been ignored (ie declared in EZPAARSE_IGNORED_DOMAINS) |
nb-lines-pkb-miss-ecs | 2107
Number of lines with unknown vendors identifiers |
nb-lines-unknown-domains | 335068
Number of lines with an unknown domain |
nb-lines-unknown-formats | 1891
Number of lines with an unknown format |
nb-lines-unordered-ecs | 0
Number of lines chronologically disordered (the chronological order is necessary for deduplication) |
nb-lines-unqualified-ecs | 86974
Number of unqualified lines (because they don't contain enough information) |
nb-lines-unknown-errors | 0
Number of lines that were rejected due to an unknown error |
url-duplicate-ecs | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/lines-duplicate-ecs.log URL to the file containing the deduplicated lines |
url-ignored-domains | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/lines-ignored-domains.log URL to the file containing the lines with an ignored domain |
url-pkb-miss-ecs | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/lines-pkb-miss-ecs.log URL to the file containing the lines with an unknown vendor's identifier |
url-unknown-domains | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/lines-unknown-domains.log URL to the file containing the lines with an unknwon domain (ie no parser has been triggered by ezPAARSE) |
url-unknown-formats | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/lines-unknown-formats.log URL to the file containing the lines with an unknown format |
url-unordered-ecs | URL to the file containing the lines with a chronological anomaly |
url-unqualified-ecs | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/lines-unqualified-ecs.log
URL to the file containing the lines containing too few information |
url-unknown-errors | http://localhost:59599/6f601540-f555-11e3-b477-758199fa5dc1/unknown-errors.log
URL to the file containing the lines rejected due to unknown errors |
# Statistics
mime-HTML | 4540
Numbers of access events for the main mime-types (names prefixed with mime-) |
---|---|
mime-MISC | 3612 |
mime-PDF | 6072 |
platform-acs | 538
Number of access events for recognized platforms (names prefixed with platform-platform_shortname) |
platform-ar | 97 |
platform-bioone | 15 |
platform-bmc | 75 |
platform-cup | 22 |
platform-edp | 27 |
platform-hw | 1740 |
platform-jstor | 9 |
platform-mal | 97 |
platform-metapress | 27 |
platform-npg | 3132 |
platform-sd | 5255 |
platform-springer | 1675 |
platform-wiley | 1515 |
platforms | 14
Number of distinct platforms recognized during the processing |
rtype-ABS | 1142
Number of access events for the main resources types (name prefixed with rtype-) |
rtype-ARTICLE | 9991 |
rtype-BOOK | 218 |
rtype-BOOKSERIE | 23 |
rtype-BOOK_SECTION | 314 |
rtype-TOC | 2536 |
# Alerts
active-alerts | unknown-domains
List of alerts that can be thrown |
---|---|
alert-1 | www.ncbi.nlm.nih.gov is unknown but represents 64% of the log lines
Alert content |
# Notifications
mailto | someone@somewhere.com
Recepient(s) of the mail sent at the end of the processing |
---|---|
mail-status | success
Status of the mail sending. |
# Deduplicating
activated | true |
---|---|
fieldname-C | session |
fieldname-I | host |
fieldname-L | login |
strategy | CLI |
window-html | 10
Number of seconds used for the deduplication timeframe of HTML consultations (ie. consultations of a resource with the same ID are grouped together in a single event, cf COUNTER)
|
window-misc | 30
Number of seconds used for the deduplication timeframe of MISC consultations |
window-pdf | 30
Number of seconds used for the deduplication tiemframe of PDF consultations |
# Files
1 | fede.bibliovie.ezproxy.2013.11.19.log.gz |
---|
# First consultation event
date | 2013-11-19 |
---|---|
datetime | 2013-11-19T00:11:57+01:00 |
domain | www.nature.com |
geoip-addr | GeoIP Address extracted from the IP address of the consulting host |
geoip-city | City, extracted from the IP address of the consulting host |
geoip-coordinates | Coordinates (longitude and latitude) extracted from the IP address of the consulting host |
geoip-country | Country code extracted from the IP address of the consulting host |
geoip-family | |
geoip-host | GeoIP Host extracted from the IP address of the consulting host |
geoip-latitude | |
geoip-longitude | |
geoip-region | |
host | test.proxad.net (a domain name in the sample log, but usually an IP address)
Original consulting host (usually an IP address) |
login | MYLOGIN
Login used for accessing the resource |
mime | MISC
Mime-type of the ressource, as recognized by the parser |
platform | npg
Short name for the consulted platform (ie name of the parser used to analyse the resource's URL) |
rtype | TOC
Reousrce type for the consulted resource, as recognized by the parser |
size | 40054
HTTP Request size |
status | 200
HTTP code sent by the server when the resource is accessed |
timestamp | 1384816317 |
title_id | siteindex
Vendor identifier, as determined by the parser |
unitid | siteindex
Unique identifier for the resource, as determined by the parser (used for deduplicating identical resources) |
url | http://www.nature.com:80/siteindex/index.html |
# Unknown Domains
The unknown domains are domains for which no parser gets started. If URLs correspond to a provider's platform that should be analysed by ezPAARSE, you have to check on the Analogist platform analysis website (opens new window) if the platform is already listed and you will also get an indication of how advanced its analysis is.