--- title: "Filtering occurrence records" author: "William K. Morris" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{5. Filtering occurrence records} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- When getting records from FinBIF there are many options for filtering the data before it is downloaded, saving bandwidth and local post-processing time. For the full list of filtering options see `?filters`. ## Location Records can be filtered by the name of a location. ```r finbif_occurrence(filter = c(country = "Finland")) #> Records downloaded: 10 #> Records available: 44691386 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 #> 1 …JX.1594385#3 Sciurus vulgaris Li… 1 60.23584 25.05693 #> 2 …KE.176/64895825d5de884fa20e297d#Unit1 Heracleum persicum … NA 61.08302 22.38983 #> 3 …JX.1594382#9 Hirundo rustica Lin… NA 64.12716 23.99111 #> 4 …JX.1594382#37 Pica pica (Linnaeus… NA 64.12716 23.99111 #> 5 …JX.1594382#49 Muscicapa striata (… NA 64.12716 23.99111 #> 6 …JX.1594382#39 Larus canus Linnaeu… NA 64.12716 23.99111 #> 7 …JX.1594382#5 Emberiza citrinella… NA 64.12716 23.99111 #> 8 …JX.1594382#31 Ficedula hypoleuca … NA 64.12716 23.99111 #> 9 …JX.1594382#41 Alauda arvensis Lin… NA 64.12716 23.99111 #> 10 …JX.1594382#21 Numenius arquata (L… NA 64.12716 23.99111 #> ...with 0 more record and 7 more variables: #> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ``` Or by a set of coordinates. ```r finbif_occurrence( filter = list(coordinates = list(c(60, 68), c(20, 30), "wgs84")) ) #> Records downloaded: 10 #> Records available: 37318868 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 #> 1 …JX.1594385#3 Sciurus vulgaris Li… 1 60.23584 25.05693 #> 2 …KE.176/64895825d5de884fa20e297d#Unit1 Heracleum persicum … NA 61.08302 22.38983 #> 3 …JX.1594382#9 Hirundo rustica Lin… NA 64.12716 23.99111 #> 4 …JX.1594382#37 Pica pica (Linnaeus… NA 64.12716 23.99111 #> 5 …JX.1594382#49 Muscicapa striata (… NA 64.12716 23.99111 #> 6 …JX.1594382#39 Larus canus Linnaeu… NA 64.12716 23.99111 #> 7 …JX.1594382#5 Emberiza citrinella… NA 64.12716 23.99111 #> 8 …JX.1594382#31 Ficedula hypoleuca … NA 64.12716 23.99111 #> 9 …JX.1594382#41 Alauda arvensis Lin… NA 64.12716 23.99111 #> 10 …JX.1594382#21 Numenius arquata (L… NA 64.12716 23.99111 #> ...with 0 more record and 7 more variables: #> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ``` See `?filters` section "Location" for more details ## Time The event or import date of records can be used to filter occurrence data from FinBIF. The date filters can be a single year, month or date, ```r finbif_occurrence(filter = list(date_range_ym = c("2020-12"))) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 23847 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time #> 1 …107 Pica pica (Linnaeus… 31 65.0027 25.49381 2020-12-31 10:20:00 #> 2 …45 Larus argentatus Po… 1 65.0027 25.49381 2020-12-31 10:20:00 #> 3 …153 Emberiza citrinella… 2 65.0027 25.49381 2020-12-31 10:20:00 #> 4 …49 Columba livia domes… 33 65.0027 25.49381 2020-12-31 10:20:00 #> 5 …117 Corvus corax Linnae… 1 65.0027 25.49381 2020-12-31 10:20:00 #> 6 …111 Corvus monedula Lin… 7 65.0027 25.49381 2020-12-31 10:20:00 #> 7 …161 Sciurus vulgaris Li… 1 65.0027 25.49381 2020-12-31 10:20:00 #> 8 …123 Passer montanus (Li… 28 65.0027 25.49381 2020-12-31 10:20:00 #> 9 …149 Pyrrhula pyrrhula (… 1 65.0027 25.49381 2020-12-31 10:20:00 #> 10 …77 Turdus pilaris Linn… 1 65.0027 25.49381 2020-12-31 10:20:00 #> ...with 0 more record and 6 more variables: #> coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

, or for record events, a range as a character vector. ```r finbif_occurrence( filter = list(date_range_ymd = c("2019-06-01", "2019-12-31")) ) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 911735 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 #> 1 …KE.921/LGE.627772/1470480 Pteromys volans (Li… NA 61.81362 25.75756 #> 2 …JX.1054648#107 Pica pica (Linnaeus… 3 65.30543 25.70355 #> 3 …JX.1054648#85 Poecile montanus (C… 1 65.30543 25.70355 #> 4 …JX.1054648#103 Garrulus glandarius… 3 65.30543 25.70355 #> 5 …JX.1054648#123 Passer montanus (Li… 3 65.30543 25.70355 #> 6 …JX.1054648#149 Pyrrhula pyrrhula (… 1 65.30543 25.70355 #> 7 …JX.1054648#93 Cyanistes caeruleus… 9 65.30543 25.70355 #> 8 …JX.1054648#95 Parus major Linnaeu… 35 65.30543 25.70355 #> 9 …JX.1054648#137 Carduelis flammea (… 2 65.30543 25.70355 #> 10 …JX.1056695#107 Pica pica (Linnaeus… 6 62.7154 23.0893 #> date_time #> 1 2019-12-31 12:00:00 #> 2 2019-12-31 10:20:00 #> 3 2019-12-31 10:20:00 #> 4 2019-12-31 10:20:00 #> 5 2019-12-31 10:20:00 #> 6 2019-12-31 10:20:00 #> 7 2019-12-31 10:20:00 #> 8 2019-12-31 10:20:00 #> 9 2019-12-31 10:20:00 #> 10 2019-12-31 10:15:00 #> ...with 0 more record and 6 more variables: #> coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

Records for a specific season or time-span across all years can also be requested. ```r finbif_occurrence( filter = list( date_range_md = c(begin = "12-21", end = "12-31"), date_range_md = c(begin = "01-01", end = "02-20") ) ) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 1486845 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time #> 1 …433443#318 Accipiter nisus (Li… 1 64.8162 25.32106 2023-02-20 15:00:00 #> 2 …531663#107 Pica pica (Linnaeus… 10 62.9199 27.71032 2023-02-20 07:40:00 #> 3 …530610#107 Pica pica (Linnaeus… 21 65.78623 24.49119 2023-02-20 09:15:00 #> 4 …530449#107 Pica pica (Linnaeus… 4 65.74652 24.62216 2023-02-20 08:20:00 #> 5 …531663#153 Emberiza citrinella… 12 62.9199 27.71032 2023-02-20 07:40:00 #> 6 …531663#49 Columba livia domes… 10 62.9199 27.71032 2023-02-20 07:40:00 #> 7 …530610#49 Columba livia domes… 2 65.78623 24.49119 2023-02-20 09:15:00 #> 8 …530610#117 Corvus corax Linnae… 1 65.78623 24.49119 2023-02-20 09:15:00 #> 9 …531663#61 Dendrocopos major (… 6 62.9199 27.71032 2023-02-20 07:40:00 #> 10 …531663#111 Corvus monedula Lin… 7 62.9199 27.71032 2023-02-20 07:40:00 #> ...with 0 more record and 6 more variables: #> coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

## Data Quality You can filter occurrence records by indicators of data quality. See `?filters` section "Quality" for details. ```r strict <- c( collection_quality = "professional", coordinates_uncertainty_max = 1, record_quality = "expert_verified" ) permissive <- list( wild_status = c("wild", "non_wild", "wild_unknown"), record_quality = c( "expert_verified", "community_verified", "unassessed", "uncertain", "erroneous" ), abundance_min = 0 ) c( strict = finbif_occurrence(filter = strict, count_only = TRUE), permissive = finbif_occurrence(filter = permissive, count_only = TRUE) ) #> strict permissive #> 52654 51733557 ``` ## Collection The FinBIF database consists of a number of constituent collections. You can filter by collection with either the `collection` or `not_collection` filters. Use `finbif_collections()` to see metadata on the FinBIF collections. ```r finbif_occurrence( filter = c(collection = "iNaturalist Suomi Finland"), count_only = TRUE ) #> [1] 691076 finbif_occurrence( filter = c(collection = "Notebook, general observations"), count_only = TRUE ) #> [1] 2110409 ``` ## Informal taxonomic groups You can filter occurrence records based on informal taxonomic groups such as `Birds` or `Mammals`. ```r finbif_occurrence(filter = list(informal_groups = c("Birds", "Mammals"))) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 22116048 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time #> 1 …5#3 Sciurus vulgaris Li… 1 60.23584 25.05693 2023-06-14 08:56:00 #> 2 …2#9 Hirundo rustica Lin… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 3 …2#37 Pica pica (Linnaeus… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 4 …2#49 Muscicapa striata (… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 5 …2#39 Larus canus Linnaeu… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 6 …2#5 Emberiza citrinella… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 7 …2#31 Ficedula hypoleuca … NA 64.12716 23.99111 2023-06-14 08:48:00 #> 8 …2#41 Alauda arvensis Lin… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 9 …2#21 Numenius arquata (L… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 10 …2#29 Dendrocopos major (… NA 64.12716 23.99111 2023-06-14 08:48:00 #> ...with 0 more record and 6 more variables: #> coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

See `finbif_informal_groups()` for the full list of groups you can filter by. You can use the same function to see the subgroups that make up a higher level informal group: ```r finbif_informal_groups("macrofungi") #> Error in finbif_informal_groups("macrofungi"): Group not found ``` ## Regulatory Many records in the FinBIF database include taxa that have one or another regulatory statuses. See `finbif_metadata("regulatory_status")` for a list of regulatory statuses and short-codes. ```r # Search for birds on the EU invasive species list finbif_occurrence( filter = list(informal_groups = "Birds", regulatory_status = "EU_INVSV") ) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 471 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 #> 1 …JX.1580858#3 Oxyura jamaicensis … 1 60.28687 25.0271 #> 2 …JX.1580860#3 Oxyura jamaicensis … 1 60.28671 25.02713 #> 3 …KE.176/62b1ad90d5deb0fafdc6212b#Unit1 Oxyura jamaicensis … 7 61.66207 23.57706 #> 4 …JX.1045316#34 Alopochen aegyptiac… 3 52.16081 4.485534 #> 5 …JX.138840#123 Alopochen aegyptiac… 4 53.36759 6.191796 #> 6 …JX.139978#214 Alopochen aegyptiac… 6 53.37574 6.207861 #> 7 …JX.139710#17 Alopochen aegyptiac… 30 52.3399 5.069133 #> 8 …JX.139645#57 Alopochen aegyptiac… 36 51.74641 4.535283 #> 9 …JX.139645#10 Alopochen aegyptiac… 3 51.74641 4.535283 #> 10 …JX.139442#16 Alopochen aegyptiac… 2 51.90871 4.53258 #> ...with 0 more record and 7 more variables: #> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

## IUCN red list Filtering can be done by [IUCN red list](https://punainenkirja.laji.fi/) category. See `finbif_metadata("red_list")` for the IUCN red list categories and their short-codes. ```r # Search for near threatened mammals finbif_occurrence( filter = list(informal_groups = "Mammals", red_list_status = "NT") ) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 42510 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 #> 1 …JX.1594024#23 Rangifer tarandus f… 15 63.31266 24.43298 #> 2 …JX.1588853#1075 Rangifer tarandus f… 1 63.84551 29.8366 #> 3 …JX.1593780#3 Pusa hispida botnic… 1 65.02313 25.40505 #> 4 …HR.3211/166639315-U Rangifer tarandus f… NA 63.7 24.7 #> 5 …HR.3211/166049302-U Rangifer tarandus f… NA 64.1 26.5 #> 6 …HR.3211/165761924-U Rangifer tarandus f… NA 63.9 24.9 #> 7 …JX.1589779#105 Rangifer tarandus f… 3 63.7261 23.40827 #> 8 …KE.176/647ad84dd5de884fa20e25e6#Unit1 Rangifer tarandus f… 1 64.12869 24.73877 #> 9 …HR.3211/165005253-U Pusa hispida botnic… NA 64.2865 23.87402 #> 10 …JX.1588052#18 Rangifer tarandus f… 2 64.13286 26.26767 #> ...with 0 more record and 7 more variables: #> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

## Habitat type Many taxa are associated with one or more primary or secondary habitat types (e.g., forest) or subtypes (e.g., herb-rich alpine birch forests). Use `finbif_metadata("habitat_type")` to see the habitat types in FinBIF. You can filter occurrence records based on primary (or primary/secondary) habitat type or subtype codes. Note that filtering based on habitat is on taxa not on the location (i.e., filtering records with `primary_habitat = "M"` will only return records of taxa considered to primarily inhabit forests, yet the locations of those records may encompass habitats other than forests). ```r head(finbif_metadata("habitat_type")) #> code name #> MKV.habitatMt Mt alpine birch forests (excluding herb-rich alpine … #> MKV.habitatTlk Tlk alpine calcareous rock outcrops and boulder fields #> MKV.habitatTlr Tlr alpine gorges and canyons #> MKV.habitatT T Alpine habitats #> MKV.habitatTp Tp alpine heath scrubs #> MKV.habitatTk Tk alpine heaths ``` ```r # Search records of taxa for which forests are their primary or secondary # habitat type finbif_occurrence(filter = c(primary_secondary_habitat = "M")) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 26362337 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time #> 1 …5#3 Sciurus vulgaris Li… 1 60.23584 25.05693 2023-06-14 08:56:00 #> 2 …2#37 Pica pica (Linnaeus… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 3 …2#49 Muscicapa striata (… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 4 …2#5 Emberiza citrinella… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 5 …2#31 Ficedula hypoleuca … NA 64.12716 23.99111 2023-06-14 08:48:00 #> 6 …2#29 Dendrocopos major (… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 7 …2#15 Sylvia borin (Bodda… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 8 …2#11 Anthus trivialis (L… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 9 …2#45 Corvus monedula Lin… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 10 …2#3 Phylloscopus trochi… NA 64.12716 23.99111 2023-06-14 08:48:00 #> ...with 0 more record and 6 more variables: #> coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

You may further refine habitat based searching using a specific habitat type qualifier such as "sun-exposed" or "shady". Use `finbif_metadata("habitat_qualifier")` to see the qualifiers available. To specify qualifiers use a named list of character vectors where the names are habitat types or subtypes and the elements of the character vectors are the qualifier codes. ```r finbif_metadata("habitat_qualifier")[4:6, ] #> code name #> MKV.habitatSpecificTypeCA CA calcareous effect #> MKV.habitatSpecificTypeH H esker forests, also semi-open forests #> MKV.habitatSpecificTypeKE KE intermediate-basic rock outcrops and boulder fiel… ``` ```r # Search records of taxa for which forests with sun-exposure and broadleaved # deciduous trees are their primary habitat type finbif_occurrence(filter = list(primary_habitat = list(M = c("PAK", "J")))) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 178 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time #> 1 …502812#393 Pammene fasciana (L… NA 60.45845 22.17811 2022-08-14 12:00:00 #> 2 …435062#6 Pammene fasciana (L… 1 60.20642 24.66127 2022-08-04 #> 3 …435050#9 Pammene fasciana (L… 1 60.20642 24.66127 2022-07-25 #> 4 …501598#39 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-21 12:00:00 #> 5 …501387#162 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-20 12:00:00 #> 6 …448030#159 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-18 12:00:00 #> 7 …447556#78 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-14 12:00:00 #> 8 …446841#408 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-12 12:00:00 #> 9 …443339#36 Pammene fasciana (L… 1 60.08841 22.48629 2022-07-10 12:00:00 #> 10 …440849#159 Pammene fasciana (L… 2 60.08841 22.48629 2022-07-08 12:00:00 #> ...with 0 more record and 6 more variables: #> coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

## Status of taxa in Finland You can restrict the occurrence records by the status of the taxa in Finland. For example you can request records for only rare species. ```r finbif_occurrence(filter = c(finnish_occurrence_status = "rare")) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 406005 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 #> 1 …HR.3211/167313706-U Pygaera timon (Hübn… NA 62.1281 27.45272 #> 2 …JX.1594282#21 Carterocephalus pal… 1 64.65322 24.58941 #> 3 …HR.3211/167197097-U Carterocephalus pal… NA 65.07819 25.55236 #> 4 …HR.3211/167183358-U Glaucopsyche alexis… NA 60.46226 22.76647 #> 5 …JX.1594291#3 Glaucopsyche alexis… 1 60.42692 22.20411 #> 6 …KE.176/6488c111d5de884fa20e295f#Unit1 Panemeria tenebrata… 1 61.16924 25.56036 #> 7 …JX.1593930#3 Hemaris tityus (Lin… 1 60.63969 27.29052 #> 8 …KE.176/64889455d5de884fa20e294f#Unit1 Pseudopanthera macu… 2 62.054 30.352 #> 9 …JX.1594170#199 Glaucopsyche alexis… 1 61.10098 28.68453 #> 10 …JX.1594112#3 Hemaris tityus (Lin… 1 61.25511 28.89127 #> ...with 0 more record and 7 more variables: #> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

Or, by using the negation of occurrence status, you can request records of birds excluding those considered vagrants. ```r finbif_occurrence( filter = list( informal_groups = "birds", finnish_occurrence_status_neg = sprintf("vagrant_%sregular", c("", "ir")) ) ) ```
Click to show/hide output. ```r #> Records downloaded: 10 #> Records available: 21725426 #> A data.frame [10 x 12] #> record_id scientific_name abundance lat_wgs84 lon_wgs84 date_time #> 1 …9 Hirundo rustica Lin… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 2 …37 Pica pica (Linnaeus… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 3 …49 Muscicapa striata (… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 4 …39 Larus canus Linnaeu… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 5 …5 Emberiza citrinella… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 6 …31 Ficedula hypoleuca … NA 64.12716 23.99111 2023-06-14 08:48:00 #> 7 …41 Alauda arvensis Lin… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 8 …21 Numenius arquata (L… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 9 …29 Dendrocopos major (… NA 64.12716 23.99111 2023-06-14 08:48:00 #> 10 …15 Sylvia borin (Bodda… NA 64.12716 23.99111 2023-06-14 08:48:00 #> ...with 0 more record and 6 more variables: #> coordinates_uncertainty, any_issues, requires_verification, requires_identification, #> record_reliability, record_quality ```

See `finbif_metadata("finnish_occurrence_status")` for a full list of statuses and their descriptions.