Data documentation
This page is mainly destined to the folks who will integrate exports produced by this module. As this target audience is mostly english-speaking, this section is in English.
We often refer to Indicia, which is the backend used by the eBMS platform to store data. Some technical choices taken with this module cannot be explained without having this information, as some redundant or weirdly formatted data is exported in a specific way so that importing them into Indicia becomes a process involving as few manual process as possible. We took the greatest care while working with the UKCEH to understand the way Indicia stores data to ensure concepts stay meaningful when exported data ends up in the eBMS.
The Indicia import process is documented on the Indicia documentation.
Data flow
The expected flow for Sterf-eBMS data is the following:
Sterfists produce data and they save it using this module.
This data is regularly exported by the module administrator and sent to an eBMS contact.
The data is integrated in the eBMS database (actually Indicia); thanks to preparatory work and a single module, there is minimal friction for integration.
On a regular basis, the MNHN exports or receives from the eBMS platform the French data that has been submitted.
The data is integrated in national database, thus at the disposal of the original actors.
I have exported data from this module, who do I send it to?
Sending your data to the eBMS is covered in the admin documentation.
When observations are produced, GeoNature immediately attaches an UUID to each one. This UUID is shared in all steps of this flow, thus preventing data duplication.
What to expect in exports?
Files
The module can produce 4 files, one per export type:
Transects data: descriptions of the eBMS transects.
Sections data: descriptions of the eBMS sections.
Transect visits: visit data grouped by transect.
Section observations: observations grouped by section visits.
Both transect and section data is straightforwardly exported as there is a matching concept in Indicia: both are locations, and transects are parent locations of sections.
Transect visits and sections observations require some internal work to be exported. At the GeoNature level, the data hierarchy implies that visit data is related to a section:
A transect contains sections.
A section can have visits.
An observation is made during a visit.
But at the Indicia level, the data hierarchy looks more like:
A transect contains sections (or more generally, locations can have parent locations).
A transect can be visited: a visit is called a sample, and visit data is related to the transect.
For each section, observations are grouped in a sub-sample, the section visit.
In practice, this means that GeoNature “section visits” must be converted into “transect visits”. This is done using averages: section visits can define different temperatures, so the transect visit use the average of all section visits temperatures.
Important
Each row of all these exports always contain an UUID that can be used as “external key” when importing them into Indicia to avoid item duplication. Identifiers for the GeoNature concepts (transects, sections, visits, observations) are transmitted as well as for the Indicia concepts (transects, sections, transect visits, section observations) so that all concepts always remain uniquely identifiable.
Danger
Because UUIDs are the only way to prevent data duplication, as a GeoNature administrator, you must avoid deleting and recreating transects, sections or any entity as it will generate a new UUID. Modifying an entity’s UUID, e.g. to restore a replaced UUID, requires manual database operations.
Column names
Some columns name contain spaces instead of underscore: this is expected behaviour, do not attempt to replace these to improve the export. Indeed, the Indicia import tool automatically maps CSV column name to the adequate database column using the column name. In some specific cases, the mapping does not work if the columns names uses underscores.
Habitats and other terms
Sterf habitats use a different nomenclature than the eBMS. We created a mapping
so that each Sterf habitat can be mapped to, at least, a principal habitat, and
in some cases, the section’s linear habitat as well. The details of the mapping
can be found in the ref_habitats.cor_sterf_to_ebms table created in the
exports/csv/export_csv.sql file of the module.
Wind speed uses the Beaufort scale but is limited to level 6. In the same vein, temperatures are Celsius degrees and cannot go above 40°.
In some cases such as with wind speeds, these values are not transmitted raw but
as a corresponding eBMS term list value, e.g. the column wind speed does not
contain a plain 3 for the level 3 but rather the string 3. Leaves and
twigs in slight motion. This helps to streamline the import process.
Note
Indicia does allow storing metadata other than what is defined by the eBMS platform, but this module currently cannot use this functionality. As a result, metadata that does not fit into eBMS columns would be lost. To prevent this, for data we consider valuable, it has been added in the comment section, and the following exports sections will describe the related formats. This way, Sterf producers can retrieve some interesting information, such as the original habitat, down the data pipeline.
Geometries
Geometric data is always converted to the expected CRS, but we still add columns to note the SRID next to each geometry column. The goal is to avoid having geometric data without its SRID in the wild.
Transects export
The transects exports contain the following columns:
external_keyTransect UUID.
nameTransect name.
codeTransect code, if it looks like an EBMS transect code (i.e. it starts with
EBMS:). This can be used to help investigate duplicated transects, or to split the export between the sections identified by external keys and those identified by code.centroid_srefCentroid of the transect as decimal degrees. SRID is always 4326.
centroid_sref_sridAlways 4326.
commentAn optional description of the transect. This column also contains the original STOC habitat.
boundary_geomWKT of the transect geometry. This may or may not be useful as the sections geometry is what ultimately matters. SRID is always 3857.
boundary_geom_sridAlways 3857.
sensitiveWhether this transect is sensitive and should be hidden from public views. « True » or « False ».
no_of_sectionsNumber of sections in this transect.
transect_width_mWidth of the sections.
principal_habitatPrincipal habitat, using the eBMS term list.
2nd_habitatSecondary habitat (optional).
principal_land_tenurePrincipal land tenure, using the eBMS term list.
2nd_land_tenureSecondary land tenure.
principal_land_managementPrincipal land management, using the eBMS term list.
2nd_land_managementSecondary land management.
notes_on_habitat_land_management_and_tenureThe name says it all!
countryAlways 216023, hardcoded value for France.
Sections export
The sections exports contain the following columns:
external_keySection UUID.
nameSection name, if any.
codeSection code, if any. Users are prompted to use eBMS-like codes such as S1, S2, etc, but this is a free text input.
parent_location_external_keyTransect UUID that this section belongs to.
centroid_srefCentroid of the section as decimal degrees. SRID is always 4326.
centroid_sref_sridAlways 4326.
commentUser comment.
boundary_geomWKT of the section (a LINESTRING). SRID is always 3857.
boundary_geom_sridAlways 3857.
section_length_mThe length of the section.
primary_habitat_presentThe primary habitat on this section; it may or may not be different than the transect primary habitat.
2nd_habitat_presentThe secondary habitat on this section; it may or may not be different than the transect secondary habitat.
linear_habitatThe linear habitat, using the eBMS term list.
primary_land_management_presentThe primary land management present on the section.
2nd_land_management_presentThe secondary land management present on the section.
notes_on_land_use_and_managementUser comments on the land management.
Transect visits export
The transect visits exports contain the following columns:
external_keyVisit globally unique identifier (GUID). As transect visits correspond to the combination of all section visits for the same day on a specific transect, this GUID is formed of the transect UUID and the visits date, e.g.
ba2234f0-ffbb-4532-bf49-81a009ae031f-2025-05-02.location external keyTransect UUID. Redundant with the preceding GUID but allows easier imports.
location_nameTransect name.
grid refCentroid of the transect (see
centroid_srefin the transect exports).grid_ref_sridAlways 4326.
dateThe visit date, formatted as YYYY-MM-DD.
start_timeThe start hour of the first section walk that day.
end_timeThe end hour of the last section walk that day.
recorder_nameThe observer name, or names if several, each on their own line.
commentUser comments regarding the visit. Because the export aggregates section visits, the comment is the concatenation of section visit comments, each line starting with the section code.
temp_deg_cAverage temperature during the visit. Max 40°C.
%_cloudAverage cloud cover during the visit, as a percentage.
wind_speedAverage wind speed during the visit, as a Beaufort scale level. Max at 6.
any_butterflies_?Boolean: 1 if any butterflies were seen during the visit, 0 otherwise.
num_sections_visitedNumber of sections visited during this transect visit. Mostly for internal use.
Section observations export
The section observations exports contain the following columns:
occurrence external keyOccurrence UUID. This is the French SINP UUID for this occurrence.
sample external keySection visit UUID.
parent_sample_external_keyTransect visit GUID, as used in the transect visits export.
location external keySection UUID.
location_nameSection name.
grid refCentroid of the section (see
centroid_srefin the section exports).grid_ref_sridAlways 4326.
dateVisit date.
reliabilityWhether the abundance count is reliable or not, using the eBMS term list.
taxon_nameTaxon scientific name, without the author part.
abundance_countCount of specimens. Always strictly positive, absences are not reported.
occurrence commentUser comment about the specific occurrence. It always also contains the TaxRef CD_NOM of the taxon so that it can be unambiguously identified.