Title: | FinBIF to GBIF |
---|---|
Description: | Tools for publishing FinBIF data to GBIF. |
Authors: | Finnish Museum of Natural History - Luomus [cph], William K. Morris [aut, cre] |
Maintainer: | William K. Morris <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.12.9000 |
Built: | 2024-11-07 08:21:42 UTC |
Source: | https://github.com/luomus/finbif2gbif |
Archive occurrence records in a Darwin Core archive.
archive_occurrences( archive, file_name, media_file_name, filter, select = sub("^.*:", "", config::get("fields")), facts = config::get("facts"), combine = config::get("combine"), n = config::get("nmax"), quiet = TRUE )
archive_occurrences( archive, file_name, media_file_name, filter, select = sub("^.*:", "", config::get("fields")), facts = config::get("facts"), combine = config::get("combine"), n = config::get("nmax"), quiet = TRUE )
archive |
Character. Path to the archive. |
file_name |
Character. The name of the file to write to the archive. |
media_file_name |
Character. The name of the media extension file to write to the archive. |
filter |
List of named character vectors. Filters to apply to records. |
select |
Character vector. Variables to return. If not specified, a
default set of commonly used variables will be used. Use |
facts |
List of extra variables to be extracted from record, event and document "facts". |
combine |
List of fields to combine. |
n |
Integer. How many records to download/import. |
quiet |
Logical. Suppress the progress indicator for multipage downloads. |
The status value returned by the zip command, invisibly.
## Not run: archive_occurrences( "dwca.zip", "occurrence.txt", list(collection = "HR.139"), c("occurrenceID", "basisOfRecord") ) ## End(Not run)
## Not run: archive_occurrences( "dwca.zip", "occurrence.txt", list(collection = "HR.139"), c("occurrenceID", "basisOfRecord") ) ## End(Not run)
Clean occurrence files in an archive.
clean_occurrences(archive, filters)
clean_occurrences(archive, filters)
archive |
Character. Path to the archive. |
filters |
List. |
The status value returned by the zip command, invisibly.
## Not run: clean_occurrences("dwca.zip", list()) ## End(Not run)
## Not run: clean_occurrences("dwca.zip", list()) ## End(Not run)
Count the number of occurrences.
count_occurrences(x, ...)
count_occurrences(x, ...)
x |
Object to count occurrences for. |
... |
Arguments passed to methods. |
Integer.
## Not run: count_occurrences(list(collection = "HR.3991")) ## End(Not run)
## Not run: count_occurrences(list(collection = "HR.3991")) ## End(Not run)
Get the file path of an archive for a collection.
get_archive_path(collection_id, dir = "archives/split")
get_archive_path(collection_id, dir = "archives/split")
collection_id |
Character. Collection id. |
dir |
Character. Path to the archive directory. |
Character. The file path of the archive.
## Not run: get_archive_path("HR.3991") ## End(Not run)
## Not run: get_archive_path("HR.3991") ## End(Not run)
Get collection IDs of FinBIF collections that are published to GBIF.
get_collection_ids(datasets, collection_ids = config::get("collections"))
get_collection_ids(datasets, collection_ids = config::get("collections"))
datasets |
List. GBIF dataset metadata retrieved using |
collection_ids |
Character. Collection ids to include regardless of sharing status. |
A character vector.
## Not run: get_collection_ids() ## End(Not run)
## Not run: get_collection_ids() ## End(Not run)
Get FinBIF collection data endpoint needed for GBIF registration.
get_endpoint(collection_id, url_base = Sys.getenv("ENDPOINTS"))
get_endpoint(collection_id, url_base = Sys.getenv("ENDPOINTS"))
collection_id |
Character. ID string of FinBIF collection. |
url_base |
Character. The base URL for the collection's data endpoint. Defaults to system environment variable, "ENDPOINTS". |
A list.
## Not run: get_endpoint("HR.3991") ## End(Not run)
## Not run: get_endpoint("HR.3991") ## End(Not run)
Get the file name of occurrences in an archive
get_file_name(filter, select = config::get("fields"), prefix = "occurrence")
get_file_name(filter, select = config::get("fields"), prefix = "occurrence")
filter |
List. |
select |
Character. |
prefix |
Character. |
Character. The file name holding occurrence records.
## Not run: get_file_name(list()) ## End(Not run)
## Not run: get_file_name(list()) ## End(Not run)
Get metadata for GBIF registered datasets of a given installation.
get_gbif_datasets( url = Sys.getenv("GBIF_API"), installation = Sys.getenv("GBIF_INSTALLATION") )
get_gbif_datasets( url = Sys.getenv("GBIF_API"), installation = Sys.getenv("GBIF_INSTALLATION") )
url |
Character. URL of GBIF API. Defaults to system environment variable, "GBIF_API". |
installation |
Character. ID key of GBIF installation. Defaults to system environment variable, "GBIF_INSTALLATION". |
A list.
## Not run: get_gbif_datasets() ## End(Not run)
## Not run: get_gbif_datasets() ## End(Not run)
Get FinBIF collection metadata needed for GBIF registration.
get_metadata( collection_id, metadata_fields = config::get("metadata"), org = Sys.getenv("GBIF_ORG"), installation = Sys.getenv("GBIF_INSTALLATION") )
get_metadata( collection_id, metadata_fields = config::get("metadata"), org = Sys.getenv("GBIF_ORG"), installation = Sys.getenv("GBIF_INSTALLATION") )
collection_id |
Character. ID string of FinBIF collection. |
metadata_fields |
List. Map of GBIF to FinBIF metadata fields to use. |
org |
Character. GBIF organization key. Defaults to system environment variable, "GBIF_ORG". |
installation |
Character. ID key of GBIF installation. Defaults to system environment variable, "GBIF_INSTALLATION". |
A list.
## Not run: get_metadata("HR.3991") ## End(Not run)
## Not run: get_metadata("HR.3991") ## End(Not run)
Get occurrence records from FinBIF.
get_occurrences(filter, select, facts, combine, n, quiet = TRUE)
get_occurrences(filter, select, facts, combine, n, quiet = TRUE)
filter |
List of named character vectors. Filters to apply to records. |
select |
Character vector. Variables to return. If not specified, a
default set of commonly used variables will be used. Use |
facts |
List of extra variables to be extracted from record, event and document "facts". |
combine |
List of fields to combine. |
n |
Integer. How many records to download/import. |
quiet |
Logical. Suppress the progress indicator for multipage downloads. |
A finbif_occ object.
## Not run: get_occurrences( c(collection = "HR.3991"), c("occurrenceID", "basisOfRecord"), 100 ) ## End(Not run)
## Not run: get_occurrences( c(collection = "HR.3991"), c("occurrenceID", "basisOfRecord"), 100 ) ## End(Not run)
Check if a FinBIF collection is registered with GBIF.
get_registration(datasets, collection_id, quiet = FALSE)
get_registration(datasets, collection_id, quiet = FALSE)
datasets |
List. GBIF dataset metadata retrieved using |
collection_id |
Character. ID string of FinBIF collection. |
quiet |
Logical. Suppress messages. |
Integer.
## Not run: get_registration(gbif_datasets(), "HR.3991") ## End(Not run)
## Not run: get_registration(gbif_datasets(), "HR.3991") ## End(Not run)
Get subset filters for a collection.
get_subsets( collection_id, filters = config::get("filters"), nmax = config::get("nmax") )
get_subsets( collection_id, filters = config::get("filters"), nmax = config::get("nmax") )
collection_id |
Character. ID string of FinBIF collection. |
filters |
List. |
nmax |
Integer. Maximum allowed size of subset. |
A list.
## Not run: get_subsets("HR.3991") ## End(Not run)
## Not run: get_subsets("HR.3991") ## End(Not run)
Get the UUID of a registered dataset.
get_uuid(registration)
get_uuid(registration)
registration |
Integer. |
Character.
## Not run: registration <- get_registration(gbif_datasets(), "HR.3991") get_uuid(registration) ## End(Not run)
## Not run: registration <- get_registration(gbif_datasets(), "HR.3991") get_uuid(registration) ## End(Not run)
Ingitiate GBIF ingestion of FinBIF data.
initiate_gbif_ingestion( uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
initiate_gbif_ingestion( uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
uuid |
Integer. GBIF registration id. |
url |
Character. URL of GBIF API. Defaults to system environment variable, "GBIF_API". |
user |
Character. GBIF username. Defaults to system environment variable, "GBIF_USER". |
pass |
Character. GBIF password. Defaults to system environment variable, "GBIF_PASS". |
NULL.
## Not run: collection <- get_collection_ids()[[1L]] registration <- get_registration(get_gbif_datasets(), collection) initiate_gbif_ingestion(registration) ## End(Not run)
## Not run: collection <- get_collection_ids()[[1L]] registration <- get_registration(get_gbif_datasets(), collection) initiate_gbif_ingestion(registration) ## End(Not run)
Get the last modified data for FinBIF records
last_mod(x, ...)
last_mod(x, ...)
x |
Object to get last modified time for. |
... |
Arguments passed to methods. |
A Date object.
## Not run: last_mod(list(collection = "HR.3991")) ## End(Not run)
## Not run: last_mod(list(collection = "HR.3991")) ## End(Not run)
Count the number of occurrence data subsets that have been archived.
n_archived_subsets(archive)
n_archived_subsets(archive)
archive |
Darwin Core archive file. |
Integer.
## Not run: n_archived_subsets("archive.zip") ## End(Not run)
## Not run: n_archived_subsets("archive.zip") ## End(Not run)
Publish a Darwin Core archive.
publish_archive(staged_archive, dir = "archives")
publish_archive(staged_archive, dir = "archives")
staged_archive |
Character. Path to the staged archive. |
dir |
Character. Path to the archive directory. |
Character. The file path of the staged archive.
## Not run: publish_archive("stage/archive.zip") ## End(Not run)
## Not run: publish_archive("stage/archive.zip") ## End(Not run)
Send FinBIF dataset endpoint to GBIF.
send_gbif_dataset_endpoint( endpoint, uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
send_gbif_dataset_endpoint( endpoint, uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
endpoint |
Character. URL of dataset endpoint generated by
|
uuid |
Character. GBIF dataset identifier. Returned by
|
url |
Character. URL of GBIF API. Defaults to system environment variable, "GBIF_API". |
user |
Character. GBIF username. Defaults to system environment variable, "GBIF_USER". |
pass |
Character. GBIF password. Defaults to system environment variable, "GBIF_PASS". |
If successful returns NULL
invisibly.
## Not run: m <- get_metadata("HR.3991") ep <- get_endpoint("HR.3991") uuid <- send_gbif_dataset_metadata(m) send_gbif_dataset_endpoint(ep, uuid) ## End(Not run)
## Not run: m <- get_metadata("HR.3991") ep <- get_endpoint("HR.3991") uuid <- send_gbif_dataset_metadata(m) send_gbif_dataset_endpoint(ep, uuid) ## End(Not run)
Send FinBIF dataset identifier to GBIF.
send_gbif_dataset_id( id, uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
send_gbif_dataset_id( id, uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
id |
Character. FinBIF collection ID for dataset. |
uuid |
Character. GBIF dataset identifier. Returned by
|
url |
Character. URL of GBIF API. Defaults to system environment variable, "GBIF_API". |
user |
Character. GBIF username. Defaults to system environment variable, "GBIF_USER". |
pass |
Character. GBIF password. Defaults to system environment variable, "GBIF_PASS". |
If successful returns NULL
invisibly.
## Not run: m <- get_metadata("HR.3991") uuid <- send_gbif_dataset_metadata(m) send_gbif_dataset_id("HR.3991", uuid) ## End(Not run)
## Not run: m <- get_metadata("HR.3991") uuid <- send_gbif_dataset_metadata(m) send_gbif_dataset_id("HR.3991", uuid) ## End(Not run)
Send FinBIF dataset metadata to GBIF.
send_gbif_dataset_metadata( metadata, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
send_gbif_dataset_metadata( metadata, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
metadata |
List. FinBIF dataset metadata generated by |
url |
Character. URL of GBIF API. Defaults to system environment variable, "GBIF_API". |
user |
Character. GBIF username. Defaults to system environment variable, "GBIF_USER". |
pass |
Character. GBIF password. Defaults to system environment variable, "GBIF_PASS". |
A list.
## Not run: m <- get_metadata("HR.3991") send_gbif_dataset_metadata(m) ## End(Not run)
## Not run: m <- get_metadata("HR.3991") send_gbif_dataset_metadata(m) ## End(Not run)
Should the collection be skipped?
skip_collection( collection_id, enabled = config::get("enabled"), whitelist = "whitelist.txt" )
skip_collection( collection_id, enabled = config::get("enabled"), whitelist = "whitelist.txt" )
collection_id |
Character. Collection id. |
enabled |
Logical. |
whitelist |
Character. Path to white-list file. |
Logical.
## Not run: skip_collection("HR.139") ## End(Not run)
## Not run: skip_collection("HR.139") ## End(Not run)
Should updating the collection for GBIF be skipped?
skip_gbif(collection_id, enabled = config::get("gbif"))
skip_gbif(collection_id, enabled = config::get("gbif"))
collection_id |
Character. Collection id. |
enabled |
Logical. |
Logical.
## Not run: skip_gbif("HR.139") ## End(Not run)
## Not run: skip_gbif("HR.139") ## End(Not run)
Get the file path of an archive for a collection.
stage_archive(archive, stage = "stage")
stage_archive(archive, stage = "stage")
archive |
Character. Path to the archive. |
stage |
Character. Path to the staging directory. |
Character. The file path of the staged archive.
## Not run: stage_archive("archive.zip") ## End(Not run)
## Not run: stage_archive("archive.zip") ## End(Not run)
Unstage an updated archive file.
unstage_archive(staged_archive, dir = "archives")
unstage_archive(staged_archive, dir = "archives")
staged_archive |
Character. Path to the staged archive. |
dir |
Character. Path to the archive directory. |
Character. The file path of the staged archive.
## Not run: publish_archive("stage/archive.zip") ## End(Not run)
## Not run: publish_archive("stage/archive.zip") ## End(Not run)
Update FinBIF dataset endpoint for GBIF.
update_gbif_dataset_endpoint( endpoint, uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
update_gbif_dataset_endpoint( endpoint, uuid, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
endpoint |
Character. URL of dataset endpoint generated by
|
uuid |
Character. GBIF dataset identifier. |
url |
Character. URL of GBIF API. Defaults to system environment variable, "GBIF_API". |
user |
Character. GBIF username. Defaults to system environment variable, "GBIF_USER". |
pass |
Character. GBIF password. Defaults to system environment variable, "GBIF_PASS". |
If successful returns NULL
invisibly.
## Not run: m <- get_metadata("HR.3991") ep <- get_endpoint("HR.3991") uuid <- send_gbif_dataset_metadata(m) update_gbif_dataset_endpoint(ep, uuid) ## End(Not run)
## Not run: m <- get_metadata("HR.3991") ep <- get_endpoint("HR.3991") uuid <- send_gbif_dataset_metadata(m) update_gbif_dataset_endpoint(ep, uuid) ## End(Not run)
Update FinBIF dataset metadata at GBIF.
update_gbif_dataset_metadata( metadata, registration, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
update_gbif_dataset_metadata( metadata, registration, url = Sys.getenv("GBIF_API"), user = Sys.getenv("GBIF_USER"), pass = Sys.getenv("GBIF_PASS") )
metadata |
List. FinBIF dataset metadata generated by |
registration |
Integer. GBIF registration. |
url |
Character. URL of GBIF API. Defaults to system environment variable, "GBIF_API". |
user |
Character. GBIF username. Defaults to system environment variable, "GBIF_USER". |
pass |
Character. GBIF password. Defaults to system environment variable, "GBIF_PASS". |
NULL.
## Not run: collection <- get_collection_ids()[[1L]] registration <- get_registration(get_gbif_datasets(), collection) update_gbif_dataset_metadata(get_metadata(collection), registration) ## End(Not run)
## Not run: collection <- get_collection_ids()[[1L]] registration <- get_registration(get_gbif_datasets(), collection) update_gbif_dataset_metadata(get_metadata(collection), registration) ## End(Not run)
Write an EML metadata file.
write_eml(archive, collection_id, uuid, metadata, eml = config::get("eml"))
write_eml(archive, collection_id, uuid, metadata, eml = config::get("eml"))
archive |
Character. Path to a DarwinCore archive. |
collection_id |
Character. Collection ID. |
uuid |
Character. GBIF ID. |
metadata |
List. |
eml |
List. |
The status value returned by the zip command, invisibly.
## Not run: registration <- get_registration(gbif_datasets(), "HR.3991") uuid <- get_uuid(registration) write_eml("dwca.zip", "HR.447", uuid, list()) ## End(Not run)
## Not run: registration <- get_registration(gbif_datasets(), "HR.3991") uuid <- get_uuid(registration) write_eml("dwca.zip", "HR.447", uuid, list()) ## End(Not run)
Write a Darwin Core archive metadata file.
write_meta( archive, filters, fields = config::get("fields"), facts = config::get("facts"), combine = config::get("combine"), id = 1 )
write_meta( archive, filters, fields = config::get("fields"), facts = config::get("facts"), combine = config::get("combine"), id = 1 )
archive |
Character. Path to the archive. |
filters |
List. |
fields |
Character vector. The field names of the data files. Field names can optionally be prepended with a namespace (one of "dwc", "dwciri", "dc" or "dcterms") separated from the field by a ":". If no namespace is specified, "dwc" will be assumed. |
facts |
List of extra variables to be extracted from record, event and document "facts". |
combine |
Named list of variables to combine. |
id |
Integer. Indicates which field can be considered the record
identifier. No ID field will be specified if |
The status value returned by the zip command, invisibly.
## Not run: write_meta( "dwca.zip", list(collection = "HR.447"), c("occurrenceID", "basisOfRecord") ) ## End(Not run)
## Not run: write_meta( "dwca.zip", list(collection = "HR.447"), c("occurrenceID", "basisOfRecord") ) ## End(Not run)
Write occurrence records to a Darwin Core archive.
write_occurrences( data, archive, file_name = "occurrence.txt", media_file_name = "media.txt" )
write_occurrences( data, archive, file_name = "occurrence.txt", media_file_name = "media.txt" )
data |
A data.frame. Occurrence records. |
archive |
Character. Path to the archive. |
file_name |
Character. The name of the file to write to the archive. |
media_file_name |
Character. The name of the media extension file to write to the archive. |
The status value returned by the zip command, invisibly.
## Not run: data <- get_occurrences( c(collection = "HR.3991"), c("occurrenceID", "basisOfRecord"), 100 ) write_occurrences(data, "dwca.zip") ## End(Not run)
## Not run: data <- get_occurrences( c(collection = "HR.3991"), c("occurrenceID", "basisOfRecord"), 100 ) write_occurrences(data, "dwca.zip") ## End(Not run)