Efficient data access for data reporting

Context

This topic relates to approaches to efficiently access GBIF-mediated data for the purpose of generating data reports. Recent developments at GBIF include the availability of GBIF snapshots on cloud computing infrastructures and the ability to query GBIF using SQL allowing for e.g. data metrics and custom data reports rather than downloading individual records. Additionally, emerging data cube formats will provide an opportunity to efficiently analyze biodiversity trends on spatiotemporal data.

Learning objectives

After completing this module, you should be able to perform the following:

  • Identify the data access landscape (i.e, what tools are available and how they are used)

  • Identify appropriate solutions for given data access use cases

  • Gain insight into coding and rGBIF

  • Explore scripting reproducible workflows

Trainers

The following trainers have developed the content for this topic:

Dag Endresen, Node Manager, Norway

Will Morris, Node Manager, Finland

Secretariat consultants: Andrew Rodrigues and John Waller

Preparation

Complete the following activities to prepare for the onsite sessions:

  1. Propose a data use case/data reporting example to share as a subject for group discussion.

  2. Create an account on GBIF.org (if you don’t already have one).

  3. Install software

Data access landscape

Introduction and data types

This presentation introduces the training topic and provides an overview of the data types found in GBIF.

 

 

Access methods

This presentation provides an overview of the various methods that can be used to access GBIF-mediated data.

 

 

Shared examples from nodes

Please review the examples shared by nodes on data uses cases. You can refer to these during the group activity for inspiration.

 

 

Exploring data access and use cases

For this activity, each group should: (a) explore and discuss possible data access and reporting use cases and (b) select a data access use case to work on during the hands-on section.

Each group should answer/discuss the following:

  1. Examples encountered by Nodes requiring them to access/use GBIF-mediated data.

  2. Discuss how the different data uses/data users/report types affect which method you might use or recommend to access GBIF-mediated data.

  3. Pick a use case or data reporting example among those discussed that can be carried forward to the next exercise. As far as possible, pick a use case that is (a) non-trivial and (b) as likely as possible to be unique among the groups.

  4. How would you approach the use case/data reporting example using the web interface, data downloads, API interface, R interface, cloud access, or SQL downloads?

  5. What would be the relative benefits or pitfalls of each approach?

At the conclusion of the activity, each group should select a member of the group to present their selected use case in plenary (max 2 minutes per group).

Case study

Conservation of Nordic Crop Wild Relatives will be presented as an example case study using the GBIF API.

 

 

Hands-on with data access and use cases

Cloud access and SQL

This presentation provides an overview of accessing GBIF-mediated data using cloud services. It also includes an introduction to SQL and how to access GBIF-mediated data using SQL.

 

 

GBIF API and rgbif

This presentation provides an overview of accessing GBIF-mediated data using the GBIF API. It also includes an overview of accessing GBIF-mediated data using rgbif. Lastly, Barents IAS portal will be presented as an example case study using rgbif.

 

 

Hands-on activity

For this activity, (a) individuals will be exposed to coding and (b) as a group will begin work on a replicable workflow (e.g., R script, etc) for accessing GBIF-mediated data.

Each group should answer/discuss/complete the following:

  1. How would you approach the use case/data reporting example you discussed earlier, using the rgbif API client?

  2. What benefits could be gained from scripting a solution to the problems discussed previously?

  3. Are there aspects to the use case/data reporting example that can’t be addressed using rgbif or a similar tool?

  4. Work together on a script(s) to address your use case with rgbif or a similar tool.

At the conclusion of the activity, each group should select a member of the group to present their solutions (max 2 minutes per group).

Action plan

The trainers will conclude the topic and offer an action plan for you to reflect on this topic post-training.