More countries continue to adopt DHIS2 as a national health information system, but recently, the software’s new features to capture many different types of data have often outpaced its native tools to analyze and visualize those data. This is especially the case for individual-level health records found in DHIS2 Tracker.
The folliwjng tutorial will demonstrate how to extract longitudinal, individual-level data from DHIS2, then display it through interactive HTML widgets, to provide an overview of trends which can be “drilled down” for further analysis.
1.0.1 Why is R a good fit for DHIS2?
In the past five years, R has become one of the top languages for health analysts researchers, in particular those those with less vocational training in programming. Various R packages have been developed to pull online data into an R environment through APIs, interface R with Javascript data visualization libraries, and roll the results into modern, interactive HTML reports.
So its easier than ever to link R with DHIS2 for robust analysis and visualization, yet the available documentation does not address HTML widgets or tracker data.
1.0.2 Who This is Tutorial For
You have some basic grasp of R, Python, STATA, or another scripting language
You want routine high-level reports from DHIS2, particularly from production instances where you lack admin access (e.g. the patient database is hosted by another country), so the reports should be fully reproducible by a colleague
You’re interested in combining event data in tracker programs for basic models requiring longitudinal individual data
You want some interactivity between maps, tables, and charts to improve data exploration
1.0.3 Who This Tutorial is NOT For
You code a lot in Javascript already. You can develop a custom app for your DHIS2 use case, and share it with the community!
You want to dump all data from DHIS2 into another system for advanced analysis. Probably easier to pull from the database tables directly.
You want to parameterize your reports, so data can be real-time and tailored to a distinct user/org unit. You can probably use DHIS2’s native Push Analytics and HTML reports, or you can step your R skills with a Shiny dashboard.
2 Getting started
First things first: Welcome! This is an RMarkdown document. If you have R and R studio on your local machine, you can download the code by clicking the button at top right and try it yourself.
We’ll work a bit with how to extract data from the DHIS2 API. To do that, we will make sure to set up our R session with the required R packages.(This, it turns out, is not so intuitive, although a handy method for reproducible session set-up is found here) Downloading packages fresh may take some time, especially if you have slow internet…
We also need to add the baseurl and username to log-in to your DHIS2 instance with the httr package and keyring package
Note: It is best practice not to store the password within the script as it can create security vulnerabilities. The user who “knits” the script into HTML will be prompted for the dhis2 Play demo password. It will then be stored as a local environment variable on the user’s machine
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, paged.print=FALSE)
options(encoding = "UTF-8")
#IMPORTANT MANUAL STEPS BEFORE RUNNING
#credentials and global variables#
#(include https:// and last backslash of DHIS2 URL, in quotes)
baseurl<-"https://play.dhis2.org/2.31.5/"
username<-"admin"
##______Now load packages and set up_____##
##first load required packages
required_packages <- c("ggplot2","stringr","httr","jsonlite","assertthat","readr","tibble","plyr","tidyr","dygraphs","xts","lubridate","knitr","askpass","DT","keyring","devtools","crosstalk","plotly","knitr","leaflet","parcoords","diffr","listviewer","highcharter", "sf")
#hint: use sessionInfo() to find loaded packages after finishing devlopment, then copy them above
#define the function to check that a package is installed
is_installed <- function(mypkg) is.element(mypkg, installed.packages()[,1])
#check if each required package is installed, if it isnt install it from CRAN, if it is, then load it
load_or_install<-function(required_packages) {
for(package_name in required_packages) {
if(!is_installed(package_name)) {
install.packages(package_name,repos="http://lib.stat.cmu.edu/R/CRAN") }
library(package_name,character.only=TRUE,quietly=TRUE,verbose=FALSE)
}
}
#run the function
load_or_install(required_packages)
#youll now be prompted for password (district)
require(keyring)
keyring::key_set("Password", username=username)
## Please enter password in TK window (Alt+Tab)
This function will log in to the DHIS2 instance. Once we log in, we do not need to log in again for all other queries! (Credit for this function to Jason Pickering’s very usefulR-for-DHIS2 tutorials
This is how data from DHIS2 are often shared. Often, it’s the easiest solution for the context…
Source: LSTHM, CStock
But by leveraging the DHIS2 analytics API in RMarkdown, we can download and analyze pivot tables, event reports, metadata, and even raw data, then render the results into PDF, word, and HTML documents to share. There is no reason to send excel files or screenshots back and forth with your partners.
The easiest way to construct an analytics API query is using the pivot table or event reports apps. Then, “download” and “download as html” will give you the link( the video below explains how to download a pivot table as XLS, choose download as html instead for the analytics link). You can change the extension to .json, .html+css, or .csv. See DHIS2 developer guide for more details
The next query searches for ANC lab events in the last 3 months in the “WHO RMNCH Tracker” program that have hemoglobin values. In a reaslistic schenario, a low hemoglobin value should go up over time due to iron supplementation treatment.
url<-paste0(baseurl,
"api/29/analytics/events/query/WSGAb5XwJ3Y.csv?", #event query for "RMCH tracker" program
"dimension=pe:LAST_3_MONTHS", #Events last month
"&stage=edqlbukwRfQ&dimension=ou:ImspTQPwCqd", #stage and org unit (root)
#data dimensions and other display properties
"&dimension=tYPIZKEVh23&dimension=vANAXwtLwcT&dimension=w75KJ2mc4zz&",
"displayProperty=NAME&tableLayout=true&columns=pe;ou;tYPIZKEVh23;vANAXwtLwcT;w75KJ2mc4zz&rows=pe;ou;tYPIZKEVh23;vANAXwtLwcT;w75KJ2mc4zz")
#now we GET command to fetch the data, and then read the content into a dataframe
output<-httr::GET(url)
report1 <-read_csv(content(output))
#lets take a look at the data
head(report1)
Modern web pages render data more usefully, and usually interactively. We can do that in R simply with the DT package.
4.1 Ugly Example
Heres the default datatable layout
DT::datatable(report1)
Maybe that was done a bit too simply…
4.2 Pretty Example
Now let’s make the presentation of data a bit nicer
#now try a "pretty" data table
#well just get all event date, ou name, and data columns.
#Then sort by event date and ou name
lengthcols <- function(x) length(colnames(x))
pretty_report1 <- report1 %>%
select(c(1, 3, 6, 9, (lengthcols(report1)-2):lengthcols(report1))) %>% #get only the necessary data columns
arrange(desc(`Event date`)) %>%
mutate("Event date"=as.Date(lubridate::ymd_hms(`Event date`, tz = "UTC"))) %>%
#Lets also add the event ID as a hyperlink
mutate("Event Link"=paste0("<a href=\"",baseurl,"api/events/",Event,"\">","Event Link","</a>"))
#the extension will allow us to pretty print, with a hyperlink
#create this as a function so we can come back to it later
makeDT <-function(x){
DT::datatable(x,
filter = 'bottom',
caption = "Event report example",
extensions = 'Buttons',
escape = FALSE,
rownames = FALSE,
width= "100%",
class = "compact",
options = list(
pageLength = 5,
dom = 'Blfrtip',
buttons = list(I("colvis"),
c('copy', 'csv', 'excel', 'pdf'))))
}
makeDT(pretty_report1)
Advantages of DT Format
Sort by column, and start typing in the search bars at the bottom of the columns to filter
Filter by date range
Change the page size
Use a global search across columns
Change what columns are visible
Export the table to other common formats
Click the Event Link to go to an API query for the event
What if we can plot the event data over “days enrolled”? Do we detect any trend?
ggplot(report1, aes(x = `WHO RMNCH Days enrolled`, y = `WHOMCH Hemoglobin value`)) +
geom_point()
Looks like theres very little pattern between days enrolled and hemoglobin value! At the very least in demo…
5 Merge Events by Enrollments
But many of these events can be grouped by their enrollment. Lets merge these data with enrollment IDs from the past 6 months. (Demo server maxes out after 1000 enrollments, so we’ll just get those for now…)
#get start date for "last 3 months"
fourmonthsago <-round_date(ymd(Sys.Date()), unit = "month") %m-% months(4)
#We will use the EVENTS resource.
#This allows us to query directly for all events of a certain program and stage,
#but not based on data values.
#For simplicity we'll just get first 1000 records.
url<-paste0(baseurl,"api/29/events.json?fields=enrollment,event,trackedEntityInstance",
"&program=WSGAb5XwJ3Y&startDate=",fourmonthsago,
"&stage=edqlbukwRfQ",
"&pageSize=",1000)
enrollment_IDs <-fromJSON(content(GET(url),"text"))
#Now we will join the enrollment IDs to the event report
#We will also create a link to the record in the tracker capture app
events_with_enrollments<- report1 %>%
dplyr::inner_join(enrollment_IDs$events, by = c("Event" = "event")) %>%
mutate("Event date"=as.Date(lubridate::ymd_hms(`Event date`, tz = "UTC"))) %>%
mutate("Record Link"=paste0("<a href=\"",baseurl,"/dhis-web-tracker-capture/index.html#/dashboard?",
"tei=",trackedEntityInstance,
"&program=WSGAb5XwJ3Y&ou=",`Organisation unit`,
"\">Record Link</a>")) %>%
#just select important variables and arrange
select(c(1, 3, `Organisation unit name`:`Record Link`))%>%
select(-c("trackedEntityInstance", `Organisation unit code`, `Organisation unit`)) %>%
arrange(desc(`Event date`))
makeDT(events_with_enrollments)
5.1 Detect Patterns 2
Maybe there are temporal trends within enrollments?
#Add a ggplot line chart for hemoglobin over days enrolled
ggplot(events_with_enrollments,
aes(x=`WHO RMNCH Days enrolled`,
y=`WHOMCH Hemoglobin value`)) +
geom_line(aes(group = enrollment))
Negative results here.. But more realistic data would show individual trends!
5.2 Longitudinal data: Growth Patterns
In the example below, we download the CDC’s length-for-age Z-score data for infant girls, and render the data in an interactive growth chart.
Remember, a child register in DHIS2 Tracker could use a similar approach to displaying growth of a population cohort over time, where each series represents an individual infant’s weight measurements.
birthdate<-fourmonthsago
cdc_data<-read_csv("https://www.cdc.gov/growthcharts/data/zscore/zwtageinf.csv")
cdc_data<- cdc_data %>%
#for simplicity, just look at girls data
subset(Sex==2) %>%
mutate("weeks_diff" = 4 * as.numeric(Agemos)) %>%
mutate("date"= birthdate + lubridate::weeks((weeks_diff)))
#gather up the data and display as line chart
hc <- cdc_data %>% gather(zscore, weight, `-2`:`2`) %>%
hchart(type = "line", hcaes(x = "Agemos", y = "weight", group="zscore")) %>%
hc_title(text = "Zscores")
#we can do the same as a time series chart
xtchart<-as.xts(cdc_data, order.by=as.POSIXct(cdc_data$date)) %>%
subset(select = `2`:`-2`) %>%
dygraph(main = "Z-Scores", ylab = "Weight in Kg") %>%
dyOptions(colors = RColorBrewer::brewer.pal(7, "Dark2")) %>%
dyHighlight(highlightSeriesBackgroundAlpha = 0.6,
hideOnMouseOut = FALSE,
highlightSeriesOpts = list(strokeWidth = 3)) %>%
dyRangeSelector()
#this method uses highcharter
hc
#this method uses dygraphs
xtchart
6 Interactive Charts in Crosstalk
Now we can add some more interactivity! We will include brushing capability that dynamically links a chart and table. This work builds on the R’s Crosstalk package, which joins certain types of HTML widgets together based on a common “shared data” object. Importantly, the linking happens directly within the browser, eliminating the need for a Shiny server. However, note that only select packages are available to work with crosstalk, and performance will slow with large datasets.
#first we created a "shared data object"
sd <- SharedData$new(events_with_enrollments)
#create a plotly scatterplot
p<-plot_ly(sd, x= ~`WHO RMNCH Days enrolled`,
y= ~`WHOMCH Hemoglobin value`,
color = ~`Organisation unit name`,
size = 3,
showlegend=FALSE) %>%
add_markers(alpha = 0.5) %>%
highlight(on = "plotly_selected", off = "plotly_deselect")
#We'll use the custom datatable function from earlier on the shared data
dt<-makeDT(sd)
#crosstalk comes with bootstrap functions to help us arrange the interactive objects
#put a org unit filter on one side, OU on another
bscols(widths = c(4,NA),
list(
filter_select("orgUnit", "Filter by OrgUnit", sd, ~`Organisation unit name`),
p),
dt)
How to use interactive filter down:
Click on one or multiple scatter plot dots
Click and drag across an area of dots on scatter plot
Select one or multiple event from table
Choose a specific org unit
7 Advanced - Linked Map
Now we will link a leaflet map. First, let’s fetch coordinates for the event org units, and display them on a simple map.
library(leaflet)
library(sf)
library(dplyr)
#get OU coordinates and merge with the data
url<-paste0(baseurl,"api/29/organisationUnits.csv?paging=false&fields=name,coordinates&filter=coordinates:!null")
output <-read_csv(content(GET(url)))
#merge the name of the org units with the event data
mapobj<-events_with_enrollments %>%
dplyr::rename("name"=`Organisation unit name`) %>%
dplyr::inner_join(output) %>%
separate(coordinates, c("longitude","latitude"), sep = ",", extra = "drop")
mapobj$latitude<-parse_number(mapobj$latitude)
mapobj$longitude<-parse_number(mapobj$longitude)
#create map object
leaflet(mapobj) %>% addTiles() %>%
addCircles(lat = ~latitude, lng = ~longitude)
Very cool. Now we can link with the data table and scatterplot. If you highlight an area on the map or scatterplot, it will highlight the other visual, and filter the table! Select any number of table items to highlight on the chart and map.
###we need to alter out DT function to hide the lat/long columns
#eventually we will want to hide geographic coordinates in the columns
#this function returns the position of the coordinates in the table so we can exclude them
hidecols<-function(x, a, b) if (is.SharedData(x) &&
a %in% colnames(x$origData()) &&
b %in% colnames(x$origData())){
removals<-x$origData()
return(c(grep(a, colnames(removals))-1,
grep(b, colnames(removals))-1))
rm(removals)
}
#remember that DT function earlier?
makeDT <-function(x){
DT::datatable(x,
filter = 'bottom',
caption = "Event report example",
extensions = 'Buttons',
escape = FALSE,
rownames = FALSE,
width= "100%",
class = "compact",
options = list(
pageLength = 5,
dom = 'Blfrtip',
buttons = list(I("colvis"),
c('copy', 'csv', 'excel', 'pdf')),
#we just hide these two columns now
columnDefs = list(list(visible=FALSE,
targets=hidecols(x,
"latitude","longitude")))))
}
#now we apply the map as a shared data object!
sd1 <- SharedData$new(mapobj)
p<-plot_ly(sd1, x= ~`WHO RMNCH Days enrolled`,
y= ~`WHOMCH Hemoglobin value`,
color = ~name,
size = 3,
showlegend=FALSE) %>%
add_markers(alpha = 0.3) %>%
highlight(on = "plotly_selected", off = "plotly_deselect")
#We'll use the custom datatable function from earlier on the shared data
dt<-makeDT(sd1)
#still trying to figure out how to highlight on the map better...
bscols(widths = c(4,NA),
list(filter_select("orgUnit", "Filter by OrgUnit", sd1, ~name),
leaflet(sd1, width = "100%", height = 300) %>%
addTiles() %>%
addMarkers(lat = ~latitude, lng = ~longitude)),
p)
dt
Hmmm thats a lot of information, and very tightly packed.
Another great way to display correlations across many data points is a parallel coordinates chart. Using the parcoords package, we can visualize thousands of patient records at once with just a few lines of code.
As an HTML widget, parallel coordinates can be “brushed” – click and drag on one more or axes, and the graphic highlights rows within the selected range. You can also arrange the columns into a better order.
In the example below, we log in to an older demo instance (2.29) which has more data for the “Inpatient Morbidity Mortality” program. These data are cross-sectional, so we only pull from one event report. But it would also be possible to join event reports from multiple stages by the enrollment, where every row is the same data point from a stage.
url<-paste0(baseurl,"api/29/analytics/events/query/eBAyeGv0exc.csv?dimension=pe:201812;201811&dimension=ou:ImspTQPwCqd&dimension=x7PaHGvgWY2&dimension=vV9UWAZohSf&dimension=GieVkTxp4HH&dimension=qrur9Dvnyt5&dimension=b8hd33dWjR6&dimension=oZg33kd9taw&stage=Zj7UnCAulEk&displayProperty=NAME&tableLayout=true&columns=pe;ou;x7PaHGvgWY2;vV9UWAZohSf;GieVkTxp4HH;qrur9Dvnyt5;b8hd33dWjR6;oZg33kd9taw&rows=pe;ou;x7PaHGvgWY2;vV9UWAZohSf;GieVkTxp4HH;qrur9Dvnyt5;b8hd33dWjR6;oZg33kd9taw")
#now we GET command to fetch the data, and then read the content into a dataframe
report2 <-read_csv(content(GET(url)))
colength<-length(report2)
#index the observations by event date, only display data columns (last 5)
r2 <- report2 %>%
arrange(`Event date`) %>%
select((colength-5):colength)
#parallel coordinates with color based on gender
parcoords::parcoords(data = r2,
color = list(
# discrete or categorical column
colorScale = "scaleOrdinal",
colorBy = "Gender",
colorScheme = "schemeCategory10"),
withD3 = TRUE,
brushMode = '1D-axes',
alphaOnBrushed = 0.2,
queue = TRUE,
rate = 50,
reorderable = TRUE)
It looks like Trainingland has some very fat babies, and some gravity-defying seniors! But with so much data it can be hard to read correlations in this plot. And once we identify these outliers, we will need to investigate their cases…
Let’s filter it down to fewer cases, and link the plot to a data table.
These visuals are better displayed side-by-side in the complementary flexdashboard example.
#include organization unit in our output
r2 <- report2 %>%
arrange(`Event date`) %>%
select((colength-5):colength)
#to improve performance, we'll ony select the most recent 1000 observations
new_r2<-head(r2, 1000)
sd2 <- crosstalk::SharedData$new(new_r2)
pc <- parcoords(data = sd2,
brushMode = '1d',
alphaOnBrushed = 0.2,
queue = TRUE,
rate = 50,
reorderable = TRUE)
bscols(widths=c(12),
pc,
makeDT(sd2))
9 R Tools for Metadata Import/Export Support
We can also use R to perform metadata update operations. Some simple tools in R Markdown can help us understand what data we are actually being updated.
Here is a basic example: updating your user name!
library(diffr)
url<-paste0(baseurl, "api/me")
me<-fromJSON(content(GET(url),"text"))
#what is this data?
listviewer::jsonedit(me)
#hey, thats not me!
new_me<-me
new_me$name<-"Brian O'Donnell"
#note JSON data is represented as nested lists in R!
write_json(me, "me.json", auto_unbox=TRUE, pretty=TRUE)
write_json(new_me, "new_me.json", auto_unbox=TRUE, pretty=TRUE)
#see the difference
diffr::diffr("me.json", "new_me.json")
url<-paste0(baseurl, "api/users/",me$id,".json")
r<-httr::PUT(url=url, body=upload_file("new_me.json"), content_type_json())
#how did the import work out?
listviewer::jsonedit(content(r))
10 Summary
This demo showed use cases where data can be meaningfully extracted from the DHIS2 API into R, providing examples of how you might automate existing reporting systems, or improve the analysis/visualization of existing Tracker data.
Note there are other ways you can use DHIS2 + RMarkdown to do some things DHIS2 cannot currently do, like:
Analyze user access and audit logs
Visualize relationship networks between tracker instances
Organize favorites by number of views
Check metadata like indicators or data elements for inconsistencies
But no one language or workflow can solve all possible data extraction and munging problems. And eventually, the methods described above will all need to be reproduced and updated with something even better…