# thesesfr-organisme
Fetches thesesfr-organisme API from ABES. This middleware is used only for log from these.fr.
# Enriched fields
Name | Type | Description |
---|---|---|
rtype | String | type of consultation (ORGANISM = organization notice viewed; OTHER = irrelevant) |
nnt | String | "sans objet" (irrelevant) |
numSujet | String | "sans objet" (irrelevant) |
etabSoutenanceN | String | "sans objet" (irrelevant) |
etabSoutenancePpn | String | "sans objet" (irrelevant) |
codeCourt | String | "sans objet" (irrelevant) |
dateSoutenance | String | "sans objet" (irrelevant) |
anneeSoutenance | String | "sans objet" (irrelevant) |
dateInscription | String | "sans objet" (irrelevant) |
anneeInscription | String | "sans objet" (irrelevant) |
statut | String | "sans objet" (irrelevant) |
discipline | String | "sans objet" (irrelevant) |
ecoleDoctoraleN | String | "sans objet" (irrelevant) |
ecoleDoctoralePpn | String | "sans objet" (irrelevant) |
partenaireRechercheN | String | "sans objet" (irrelevant) |
partenaireRecherchePpn | String | "sans objet" (irrelevant) |
auteurN | String | "sans objet" (irrelevant) |
auteurPpn | String | "sans objet" (irrelevant) |
directeurN | String | "sans objet" (irrelevant) |
directeurPpn | String | "sans objet" (irrelevant) |
presidentN | String | "sans objet" (irrelevant) |
presidentPpn | String | "sans objet" (irrelevant) |
rapporteursN | String | "sans objet" (irrelevant) |
rapporteursPpn | String | "sans objet" (irrelevant) |
membresN | String | "sans objet" (irrelevant) |
membresPpn | String | "sans objet" (irrelevant) |
personneN | String | "sans objet" (irrelevant) |
personnePpn | String | "sans objet" (irrelevant) |
organismeN | String | name of the organization regardless of its role (thesis defense institution, doctoral school, research partner, etc.) |
organismePpn | String | identifier (PPN) of the organization regardless of its role (thesis defense institution, doctoral school, research partner, etc.) |
idp_etab_nom | String | "sans objet" (irrelevant) |
idp_etab_ppn | String | "sans objet" (irrelevant) |
idp_etab_code_court | String | "sans objet" (irrelevant) |
platform_name | String | "sans objet" (irrelevant) |
publication_title | String | "sans objet" (irrelevant) |
source | String | Upcoming : data source: STEP, STAR, Sudoc |
domaine | String | Upcoming : thematic domain associated with the thesis |
doiThese | String | Upcoming : DOI assigned to the thesis |
accessible | String | Upcoming : is the thesis accessible online: yes or no |
langue | String | Upcoming : language of the thesis writing |
# Prerequisites
Ec needs unitid and rtype equal to RECORD.
You must use thesesfr-organisme after filter, parser, deduplicator middleware.
You must use the 3 middlewares at once, in this order : thesesfr, thesesfr-personne, thesesfr-organisme
-H "ezPAARSE-Middlewares: thesesfr,thesesfr-personne,thesesfr-organisme"
# Recommendation
This middleware should be used after thesesfr and thesesfr-personne.
# Headers
- thesesfr-organisme-ttl : Lifetime of cached documents, in seconds. Defaults to
7 days (3600 * 24 * 7)
. - thesesfr-organisme-throttle : Minimum time to wait between queries, in milliseconds. Defaults to
200
ms. - thesesfr-organisme-base-wait-time : Time to wait before retrying after a query fails, in milliseconds. Defaults to
1000
ms. This timedoubles
after each attempt. - thesesfr-organisme-paquet-size : Maximum number of identifiers to send for query in a single request. Defaults to
50
. - thesesfr-organisme-buffer-size : Maximum number of memorized access events before sending a request. Defaults to
1000
. - thesesfr-organisme-max-attempts : Maximum number of trials before passing the EC in error. Defaults to
5
. - thesesfr-organisme-user-agent : Specify what to send in the
User-Agent
header when querying thesesfr-organisme. Defaults toezPAARSE (https://readmetrics.org; mailto:ezteam@couperin.org)
.
# How to use
# ezPAARSE admin interface
You can add or remove thesesfr-organisme by default to all your enrichments, provided you have added an API key in the config. To do this, go to the middleware section of administration.
# ezPAARSE process interface
You can use thesesfr-organisme for an enrichment process. You just add the middleware
# ezp
You can use thesesfr for an enrichment process with ezp (opens new window) like this:
# enrich with one file
ezp process <path of your file> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Filter-Redirects: false" \
--header "ezPAARSE-Middlewares: thesesfr,thesesfr-personne,thesesfr-organisme"
--header "Output-Fields: +nnt, +numSujet, +etabSoutenanceN, +etabSoutenancePpn, +codeCourt, +dateSoutenance, +anneeSoutenance, +dateInscription, +anneeInscription, +statut, +discipline, +ecoleDoctoraleN, +ecoleDoctoralePpn, +partenaireRechercheN, +partenaireRecherchePpn, +auteurN, +auteurPpn, +directeurN, +directeurPpn, +presidentN, +presidentPpn, +rapporteursN, +rapporteursPpn, +membresN, +membresPpn, +personneN, +personnePpn, +organismeN, +organismePpn, +platform_name, +publication_title, +libelle_idp"
--header "Log-Format-apache: %h %l %{login}<.*> %t \"%r\" %>s %b \"%{Referer}<.*>\" \"%{User-Agent}<.*>\" \"%{Shib-Identity-Provider}<.*>\" \"%{eppn}<.*>\" \"%{primary-affiliation}<.*>\" \"%{supannEtablissement}<.*>\""
--out ./result.csv
# enrich with multiples files
ezp bulk <path of your directory> \
--host <host of your ezPAARSE instance> \
--settings <settings-id>
--header "ezPAARSE-Filter-Redirects: false" \
--header "ezPAARSE-Middlewares: thesesfr,thesesfr-personne,thesesfr-organisme"
--header "Output-Fields: +nnt, +numSujet, +etabSoutenanceN, +etabSoutenancePpn, +codeCourt, +dateSoutenance, +anneeSoutenance, +dateInscription, +anneeInscription, +statut, +discipline, +ecoleDoctoraleN, +ecoleDoctoralePpn, +partenaireRechercheN, +partenaireRecherchePpn, +auteurN, +auteurPpn, +directeurN, +directeurPpn, +presidentN, +presidentPpn, +rapporteursN, +rapporteursPpn, +membresN, +membresPpn, +personneN, +personnePpn, +organismeN, +organismePpn, +platform_name, +publication_title, +libelle_idp"
--header "Log-Format-apache: %h %l %{login}<.*> %t \"%r\" %>s %b \"%{Referer}<.*>\" \"%{User-Agent}<.*>\" \"%{Shib-Identity-Provider}<.*>\" \"%{eppn}<.*>\" \"%{primary-affiliation}<.*>\" \"%{supannEtablissement}<.*>\""
# curl
You can use thesesfr for an enrichment process with curl like this:
curl -X POST -v http://localhost:59599 \
-H "ezPAARSE-Filter-Redirects: false" \
-H "ezPAARSE-Middlewares: thesesfr,thesesfr-personne,thesesfr-organisme,idp-metadata" \
-H "Output-Fields: +nnt, +numSujet, +etabSoutenanceN, +etabSoutenancePpn, +codeCourt, +dateSoutenance, +anneeSoutenance, +dateInscription, +anneeInscription, +statut, +discipline, +ecoleDoctoraleN, +ecoleDoctoralePpn, +partenaireRechercheN, +partenaireRecherchePpn, +auteurN, +auteurPpn, +directeurN, +directeurPpn, +presidentN, +presidentPpn, +rapporteursN, +rapporteursPpn, +membresN, +membresPpn, +personneN, +personnePpn, +organismeN, +organismePpn, +platform_name, +publication_title, +libelle_idp" \
-H "Log-Format-apache: %h %l %{login}<.*> %t \"%r\" %>s %b \"%{Referer}<.*>\" \"%{User-Agent}<.*>\" \"%{Shib-Identity-Provider}<.*>\" \"%{eppn}<.*>\" \"%{primary-affiliation}<.*>\" \"%{supannEtablissement}<.*>\""
-F "file=@<log file path>"