R packages used in Botswana Health Care Data Science Research

Considering the many R packages released each year, this article will look at the packages used in Health Care Data Science Research in Botswana from 2018 to 2022.

Photo by Justice Hubane on Unsplash

This post won’t be explaining functions and code, just an explanation of the packages used.

M y home country Botswana is landlocked in Southern Africa, home to dangerous wildlife like Lions, Elephants, Basarwa (affectionately known as “Bushmen”) and swamplife . Little is known about her use of R programming for Data Science research particularly in Health care.

Health care research in Botswana aims to identify, evaluate and improve general health conditions. The data collected for descriptive analysis from different regions of Botswana help to understand the actuality of on going treatment distribution and health institution management in order to improve Health care services.

The following R packages were used in most Health care Data Science research.
1. forecast
2. oce
3. ggplot2
4. SNP Relate
5. inctools
6. APE
7. adephylo
8. iGraph

1. forecast

forecast package developed by Rob Hyndman

The forecast package was developed for automatic time series forecasting. It is part of a forecasting bundle which contains the fma, Mcomp and expsmooth packages developed by Rob Hyndman.

The forecast package contains functions for:

  • Univariate forecasting
  • Automatic forecasting using exponential smoothing
  • ARIMA models
  • Theta method
  • Cubic splines
  • Other common forecasting methods
Four time series showing point forecasts. Image by Rob Hynman and Yeasmin Khandakar.

2. oce

The oce package is used for reading captured data from Oceanographic instruments. Initially, designed for real-world applications, oce supports a broad range of practical work too.

Even though there are no oceans and seas in Botswana, the package makes it easy for analysing details of calculations, discipline specific file formats, and plots.

Generic functions take care of general operations such as sub-setting and plotting data, while specialized functions address more specific tasks such as Hydrographic analysis and ADCP coordinate transformations. According to Dan.E. Kelley, it’s easy to document work done with oce because its functions automatically update processing logs stored within its data objects.

3. ggplot2

The most well known of the packages in the list is ggplot2. Ggplot2 is used for making plots and annotations for data visualisation. The different types of plots built using ggplot2 can range from dendrograms, network graphs and histograms. Ggplot2 can improve the quality of the graphics just from changing fonts, sizes and images for attractive data reading.

exon definition and coding variant annotation developed using ggplot2. Imgae by Rethabile et al (2018)

4. SNP Relate

SNP Relate is used in Genomic exploration for Principle Component Analysis (PCA) and relatedness analysis using identity-by-descent measures.

It was developed for multi-core symmetric multiprocessing computer architectures. The SNP Relate package provides computation for Single-Nucleotide Polymorphism (SNP) data in Genome-wide association studies.

Unfortunately, like other packages, the SNP Relate documentation is no longer on CRAN but can be found from recommended links as archives.

Principal Component Analysis plot analysis using 1000 Genomes and Southern African Populations. Image by Rethabile et al (2018)

5. inctools

inctools graph showing showing HIV incidence from biomarker data. Image by Grebe et al (2018)

The inctools package is used for estimating prevalence from biomarker data in cross-sectional checks and for calibrating tests for any recent infection.

Originally developed to measure HIV infection prevalence in a certain population, it gives state of the art functionality to support large aspects of population position prevalence surveillance. The alleviation for the work of the package came from the challenges associated with estimating population position HIV prevalence.

6. APE

APE, which stands for Analyses of Phylogenetics and Evolution, is used in molecular evolution and phylogenetics. The APE package uses phylogenetic and genealogical trees as input when making statistical analyses.

The APE package has functions for working on phylogenetic trees as well as phylogenetic and evolutionary analysis such as population genetic and comparative methods.

APE takes advantage of the numerous R functions for statistics, graphics and also provides a flexible framework for developing and implementing more statistical methods for the analysis of evolutionary processes.

7. adephylo

The adephylo package is designed for the analysis of comparative evolutionary data. Phylogenetic comparative methods are aimed at accounting for, or removing the effects of phylogenetic signal in the analysis of biological traits.

8. iGraph

The iGraph package provides tools for plotting networking graphs. It can handle huge graphs with millions of vertices, edges and it’s also suitable for grid computing. It contains routines for:

  • Creating, manipulating and imaging networks.
  • Calculating colourful structural parcels.
  • Importing from and exporting to colourful train formats.

While using GNU( GNU’s Not Unix! software), R and Python, it supports fast development and fast prototyping.

The extent of HIV lineage spread between and across communities in Botswana. Image by Novitsky et al (2020).

If you like, you can find me on Linkedin and Mastodon.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *