Msc Thesis Pt1: Describing NHCI Dropouts

Working document

Author
Affiliation

Brian ODonnell

Charité - Universitätsmedizin Berlin

This document shows work for describing NHCI patient population by attendance status.

Part two in a separate document, will work with the resulting dataset to predict patient dropouts using the tidymodels workflow

For a full description of the final dataset, see section Section 7

Goals Summary

Code
 #| label: load-packages
 #| include: false
 #| echo: false

library(tidyverse)
library(table1)
library(slider)
library(skimr)
library(patchwork)
library(summarytools)
library(kableExtra)
library(effectsize)
library(RColorBrewer)
library(DT)
library(here)

palette <- brewer.pal(n = 8, name = "Dark2")

mytheme<-
  theme_minimal()+
  theme(
  panel.background = element_rect(fill = "white", color = NA),
  plot.background = element_rect(fill = "white", color = NA),
  legend.background = element_rect(fill = "white", color = NA),
  text = element_text(family="sans"),
  strip.background = element_rect(
    color = "grey70", 
    linewidth = 1))

options(ggplot2.discrete.colour = palette, ggplot2.discrete.fill = palette)

theme_set(mytheme)

my_ggsave<-function(x, ...){
  
  ggsave(paste0(x,".png"), plot=last_plot(),
            path=here::here("figs"), 
            dpi=300,
            width=ifelse(is.na(...),7, ...))
  
    ggsave(paste0(x,".tiff"), plot=last_plot(), 
            path=here::here("figs"), 
            dpi=300,
            width=ifelse(is.na(...),7, ...))
}

DHIS2 Data Import and Cleaning

DHIS2 Events (Patient visits) with dates between Jan 1 2021 and June 30 2024 were exported.

Patient IDs and other PII were removed. Age was calculated as years between DOB and Registration Date. Only those with with hypertension and consent to record data checked are included. Events that are likely test data or created outside Kano and Ogun are removed.

Code
dhis<-readr::read_csv("output_file.csv") %>% 
  janitor::clean_names()


dhis_clean<-dhis %>%
  mutate(across(contains("_date"), dmy)) %>%  # convert to date everything thats a date
  mutate(across(ends_with("updated_on"), dmy)) %>%  # convert to date everything thats a date
  mutate(across(contains("mmhg"),as.double)) %>% # BP shouldnt be character
  select(!c(program_stage, geometry, longitude, latitude)) %>% #drop unused columns
  arrange(reg_id,event_date) %>% #now make a running tally for each patient visits
  group_by(reg_id) %>%
  mutate("one"=1, "appt_tally"=cumsum(one)) %>% 
  select(-one) %>% 
  ungroup() %>% 
  rename_all(~stringr::str_replace(.,"^htn_","")) %>% # clean names
  # now extract the OU hierarchy
  separate(organisation_unit_name_hierarchy, 
           into=c("Nat","State","LGA","Ward","HC","FACILITY"), sep="/") %>% # clinic hierarchy
  mutate(age=as.integer(newage)) %>% # convert age to number
  filter(!is.na(age) & age>17 & age < 130) %>% # and remove non adults
  filter(does_patient_have_hypertension=="Yes") %>% # remove those without hypertension confirmed
  filter(consent_to_record_data==1 ) %>% # and those without consent to record data
  filter(event_status %in% c("COMPLETED","ACTIVE") ) %>% #remove scheduled events 
  filter(!str_detect(State,"ad ")) %>%   #and any outside K/O
  filter(!str_detect(State,"ab ")) %>%   #and any outside K/O
  filter(!str_detect(created_by,"Test")) # remove test data

Data Quality Check

Skim of data

Looking at all columns in a skim.

Code
dhis_clean %>% skimr::skim()
Data summary
Name Piped data
Number of rows 165208
Number of columns 61
_______________________
Column type frequency:
character 27
Date 5
logical 15
numeric 14
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
stored_by 49438 0.70 7 17 0 107 0
created_by 0 1.00 18 46 0 106 0
last_updated_by 0 1.00 18 46 0 107 0
organisation_unit_name 0 1.00 19 46 0 104 0
Nat 0 1.00 11 11 0 1 0
State 0 1.00 15 15 0 2 0
LGA 0 1.00 11 43 0 42 0
Ward 0 1.00 9 27 0 100 0
HC 0 1.00 11 47 0 105 0
FACILITY 156493 0.05 23 37 0 10 0
program_status 0 1.00 6 9 0 3 0
event_status 0 1.00 6 9 0 2 0
organisation_unit 0 1.00 11 11 0 105 0
med_losartan 106932 0.35 13 14 0 2 0
hypertension_treated_in_the_past 1165 0.99 10 23 0 2 0
med_hydrochlorothiazide 163799 0.01 24 26 0 2 0
other_hypertension_medication 162573 0.02 1 64 0 126 0
patient_followup_status 12318 0.93 6 20 0 3 0
ncd_patient_status 0 1.00 4 15 0 5 0
days_of_medication 137932 0.17 7 7 0 3 0
does_patient_have_diabetes 10994 0.93 2 3 0 2 0
sex 43 1.00 4 6 0 2 0
relationship_to_patient 34371 0.79 3 8 0 9 0
newage 0 1.00 2 3 0 96 0
med_amlodipine 4308 0.97 10 18 0 3 0
med_telmisartan 164758 0.00 17 17 0 2 0
does_patient_have_hypertension 0 1.00 3 3 0 1 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
event_date 0 1 2021-01-01 2024-07-01 2023-01-09 1278
last_updated_on 0 1 2021-01-01 2024-07-21 2023-06-09 1276
scheduled_date 0 1 1974-06-12 2201-08-29 2023-03-13 1372
registration_date 0 1 1972-01-07 2024-07-17 2022-04-25 1324
incident_date 0 1 2003-12-08 2024-07-17 2022-06-02 1304

Variable type: logical

skim_variable n_missing complete_rate mean count
organisation_unit_code 165208 0 NaN :
blood_sugar_reading_rbs_mgdl 165208 0 NaN :
blood_sugar_unit 165208 0 NaN :
history_of_heart_attack 165208 0 NaN :
history_of_stroke 165208 0 NaN :
blood_sugar_reading_fbs_mmoll 165208 0 NaN :
blood_sugar_reading_ppbs_mmoll 165208 0 NaN :
blood_sugar_reading_ppbs_mgdl 165208 0 NaN :
med_glibenclamide 165208 0 NaN :
other_diabetes_medication 165207 0 0 FAL: 1
blood_sugar_reading_rbs_mmoll 165208 0 NaN :
type_of_diabetes_measure 165208 0 NaN :
blood_sugar_reading_hba1c 165208 0 NaN :
blood_sugar_reading_fbs_mgdl 165208 0 NaN :
med_metformin 165208 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
height_cm 157928 0.04 86.58 79.29 0.45 1.6 148.0 160.0 758 ▇▆▁▁▁
bp_diastole_mmhg 1300 0.99 85.28 239.52 1.00 76.0 82.0 90.0 90102 ▇▁▁▁▁
weight_kg 157250 0.05 73.61 71.50 28.00 61.0 70.0 82.0 6298 ▇▁▁▁▁
bmi 157923 0.04 136982.95 280270.71 0.00 27.0 38.3 262222.2 4296296 ▇▁▁▁▁
bmi_measurement 156527 0.05 1.00 0.00 1.00 1.0 1.0 1.0 1 ▁▁▇▁▁
consent_to_send_sms_reminders 138779 0.16 1.00 0.00 1.00 1.0 1.0 1.0 1 ▁▁▇▁▁
blood_sugar_measurement 164697 0.00 1.00 0.00 1.00 1.0 1.0 1.0 1 ▁▁▇▁▁
bp_systole_mmhg 700 1.00 151.44 1149.85 1.00 128.0 140.0 152.0 207120 ▇▁▁▁▁
consent_to_record_data 0 1.00 1.00 0.00 1.00 1.0 1.0 1.0 1 ▁▁▇▁▁
reg_id 0 1.00 16122.46 9360.88 0.00 7913.0 16113.0 24221.0 32332 ▇▇▇▇▇
has_supporter 0 1.00 0.80 0.40 0.00 1.0 1.0 1.0 1 ▂▁▁▁▇
has_phone 0 1.00 0.81 0.39 0.00 1.0 1.0 1.0 1 ▂▁▁▁▇
appt_tally 0 1.00 6.26 5.80 1.00 2.0 4.0 9.0 45 ▇▂▁▁▁
age 0 1.00 54.11 14.13 18.00 45.0 54.0 64.0 124 ▂▇▃▁▁

Clean Registration Dates

Looking at registration dates, seems some registrations are after first visit (“back data entry”).

4% of patients have later registration dates than dates of first visits.

But, registrations should always be before or on first visit date. Registration = entry to program.

This snippet overwrites registration dates to the date of first visit. The operation potentially biases Age variable, which is based on days between date of birth and registration.

Question: How exactly are registration dates used???

Code
# create a new registration date with rules
dhis_clean_newReg<-dhis_clean %>% 
  mutate(reg_date_new=case_when(
    appt_tally==1 & registration_date<=event_date ~ registration_date,
    appt_tally==1 & registration_date>event_date ~ event_date,
    appt_tally>1 ~ NA,
    TRUE ~ NA)) %>% 
  mutate(reg_date_new=if_else(as.integer(as.Date(ymd("2020-01-01"))-registration_date) > 0, 
                              event_date, reg_date_new)) %>% 
  group_by(reg_id) %>%
  fill(reg_date_new, .direction="down") %>% 
  ungroup()

dhis_clean_newReg %>% 
  select(reg_id, registration_date, reg_date_new, appt_tally, event_date) %>% 
  mutate(reg_date_post_ev1=if_else(reg_date_new!=registration_date & appt_tally==1, 1, 0)) %>% 
  filter(appt_tally==1) %>% 
  count(reg_date_post_ev1) %>% 
  mutate(percent=n/sum(n))
# A tibble: 2 × 3
  reg_date_post_ev1     n percent
              <dbl> <int>   <dbl>
1                 0 29554  0.962 
2                 1  1157  0.0377
Code
# Now rewrite the clean dhis dataset. 
# Remove unneeded columns and create days since entry
dhis_clean<-dhis_clean_newReg %>% 
  mutate(registration_date=reg_date_new) %>% 
    mutate(days_since_entry=as.double(event_date-registration_date)) %>%  #days since entry
  select(-reg_date_new) %>% 
  janitor::remove_empty(which = "cols") # remove all unnecessary columns
Code
# dhis_clean<-dhis_clean %>% filter(registration_date >= ymd("2021-01-01"))

Notice the dates and years of events, looks correct.

Code
dhis_clean %>%  
  select(event_date, reg_id) %>%
  distinct() %>% 
  mutate(year=lubridate::year(as.Date(event_date))) %>%
  count(year) %>%   kable() %>% kableExtra::kable_styling()
year n
2021 24643
2022 55985
2023 57631
2024 25452

Zero ending values

And, 30% of all “Age at Registration” end in 0. (36% of SBPs do as well).

HCWs were using the DOB attribute as An ESTIMATE of Age, instead of as a true DOB. A feature of DHIS2 enables this estimation.

But between the incorrect registration date and using Age/DoB as an estimate, the Age Variable is imprecise and has unknown measurement error.

Code
dhis_clean %>%
  select(newage, reg_id) %>%
    distinct() %>% 
  mutate(age=as.integer(newage)) %>% 
  select(age) %>% 
  filter(age>0) %>% 
## !!!! 30% of all Ages end in 0!!!! 
## Measurement error by entering age instead of DOB
  # mutate(age_zero=if_else(age %% 10==0, 1, 0)) %>% 
  # count(age_zero) %>% 
  # mutate(per=n/sum(n)) 
  ggplot() +
  geom_histogram(aes(x=age)) +
  labs(title="Histogram of Age at Registration for All Visits")

Code
dhis_clean %>%
  mutate(age_zero=if_else(bp_systole_mmhg %% 10==0, 1, 0)) %>%
  filter(99 < bp_systole_mmhg & bp_systole_mmhg<200) %>%
  # count(age_zero) %>%
  # mutate(per=n/sum(n))
  ggplot() +
  geom_histogram(aes(x=bp_systole_mmhg)) +
  labs(title="Histogram of SBP All Visits",
       subtitle = "SBPs less than 200. Spikes are values ending in zero") 

Duplicate events?

It looks like a number of visits occur on the same day for many patients.

Code
## Found duplicate events!!!
tots_d <- dhis_clean %>% 
  mutate(reg_visit_id=paste0(reg_id,"_",appt_tally)) %>% 
  group_by(reg_id) %>% 
  mutate(prev_event_date = lag(event_date, n=1, order_by=event_date)) %>% 
  mutate(lead_event_date = lead(event_date, n=1, order_by=event_date)) %>% 
  ungroup()


### highlight them as dupes
dupes <- tots_d %>% 
  # select(reg_id, event_date, prev_event_date, organisation_unit_name) %>% 
  # ungroup() %>% 
  mutate(dupe=if_else(event_date==prev_event_date | event_date==lead_event_date, 1, 0)) %>%
  group_by(reg_id) %>% 
  # fill(dupe, .direction="down") %>%
  filter(dupe==1) %>% arrange(reg_id,event_date)



## TRUE dupes also have identical dates except "last updated" date
dupes_true <- dupes %>% 
  ungroup() %>% 
  select(!c(lead_event_date, prev_event_date, last_updated_by)) %>% 
  ## this only keeps rows that are identical except for the appointment tally and visit id 
  janitor::get_dupes(!c(appt_tally, reg_visit_id))

print(paste0("total patients ", length(unique(tots_d$reg_id))))
[1] "total patients 30714"
Code
print(paste0("patient records with dupe events ", length(unique(dupes$reg_id))))
[1] "patient records with dupe events 1122"
Code
print(paste0("duped events ", length(dupes$reg_id)))
[1] "duped events 2856"
Code
print(paste0("patient records with true dupe events on same day ", length(unique(dupes_true$reg_id))))
[1] "patient records with true dupe events on same day 151"
Code
# length(unique(dhis_subset$reg_id))


# Only the first of all "true dupe" events should still be kept!
dupes_true_extras<-dupes_true %>% 
      group_by(reg_id) %>% 
      mutate("one"=1, "dupe_visit_id"=cumsum(one)) %>% 
      select(-one) %>% 
      select(reg_id,reg_visit_id, dupe_visit_id) %>% 
      filter(dupe_visit_id>1)

## The new dataset with totals is called tots and has removed the "true dupe" visits
tots <-tots_d %>% 
      anti_join(dupes_true_extras, by="reg_visit_id")

# (1122-188)/length(unique(dhis_clean$reg_id))

1122 of 30714 patient records reported at least one pair of visits on the same day ( 2856 events total). Of these, 151 patients had “true duplicates” where data reported within each visit was identical.

NEXT STEP for dupes would be to pivot longer, then find those variables that are NOT identical within Dupes

This table shows us the FIELDS which do not match within a pair of duplicated events for the same registration.

Code
d_long <-dupes %>% 
  group_by(reg_id) %>% 
  arrange(event_date, last_updated_on) %>% 
  mutate("one"=1, "dupe_id"=cumsum(one)) %>% 
  select(-one) %>% 
  ungroup() 

dupe_clash <-d_long %>% 
    filter(dupe_id<3) %>% # just those regs with exactly 2 dupe events
    mutate(across(everything(), as.character)) %>% # no dates or appt counts
    select(!ends_with("_date"), -appt_tally, -reg_visit_id) %>% 
    pivot_longer(!c(reg_id, dupe_id), names_to="var",values_to="value") %>% 
    pivot_wider(names_from=dupe_id,
                names_prefix = "dupe_",
                values_from=c(value)) %>% 
    # filter(!is.na(dupe_1)) %>%
    mutate(match=case_when(
      is.na(dupe_1) & is.na(dupe_2) ~ TRUE,
      dupe_1==dupe_2 ~ TRUE,
      TRUE ~ FALSE)) %>%
    filter(match==FALSE) %>% 
    arrange(reg_id)

dupe_clash %>% 
  count(var, sort=T)
# A tibble: 24 × 2
   var                    n
   <chr>              <int>
 1 last_updated_on      516
 2 bp_diastole_mmhg     401
 3 bp_systole_mmhg      395
 4 med_losartan         152
 5 med_amlodipine       151
 6 event_status          89
 7 days_of_medication    76
 8 created_by            66
 9 stored_by             65
10 bmi_measurement       37
# ℹ 14 more rows

Last Updated date could be helpful for deduping, same as stored_by, created_by, and event status (COMPLETED locks the values, so if there is an error there, another one could be made on same place).

But if those are not obvious for which one should be used, then we would use the non null BP.

Two visits on same day with non null BP values will be difficult to merge automatically.

THE RULES FOR MERGING TWO PATIENT VISITS ON SAME DATE

  1. If one of the events does not have a BP reading, all values in the event with BP reading take priority (if any duped event has all BP NA, its removed from dataset).

  2. ACTIVE status events are generally opened after COMPLETED events to make a change. If its the only ACTIVE event that date, use this event. Priority = 3 (max)

  3. If other clashes, count the number of nulls for medication. If one of the events has least med nulls, then use that event. Priority = 2

  4. If still cannot reconcile, use the first event by date last updated (or, if last updated on same day, random). Priority = 1

Code
# this chunk will apply the priority rules to flag priority events
d_long_priorities <- d_long %>% 
    filter(!is.na(bp_diastole_mmhg) & !is.na(bp_systole_mmhg)) %>% 
    group_by(reg_id, event_date, event_status) %>% 
    mutate(dupes_evs_with_this_status=n()) %>% ungroup() %>% 
    mutate(dupe_meds_null_count = rowSums(is.na(across(contains("med"))))) %>% 
    group_by(reg_id, event_date) %>% 
    mutate(dupes_evs_mostMeds=min(dupe_meds_null_count)) %>% 
    mutate(dupes_evs_withMedCount=n()) %>%
      mutate(dupe_id_priority =
      case_when(
        dupes_evs_with_this_status==1 & event_status=="ACTIVE" ~ 3,
        dupes_evs_mostMeds==dupe_meds_null_count & dupes_evs_withMedCount==1 ~ 2,
        dupe_id==min(dupe_id) ~ 1,
          TRUE ~ 0
        ) ) %>% ungroup()

# WE WANT TO KEEP ALL THESE AND REMOVE OTHER DUPES
d_long_keepers <-d_long_priorities %>% 
       group_by(reg_id, event_date) %>% 
       filter(dupe_id_priority==max(dupe_id_priority)) %>% ungroup()

d_long_remove <- d_long %>% 
  anti_join(d_long_keepers, by="reg_visit_id")

# d_long_remove %>% colnames()

d_long_keepers %>% 
  janitor::tabyl(dupe_id_priority) %>% 
  kable() %>% kableExtra::kable_styling()
dupe_id_priority n percent
1 1195 0.8871566
2 104 0.0772086
3 48 0.0356347

We remove 1509 events from the main dataset and then recreate the appointment tally and visit IDs.

Code
tots<-tots %>% 
  anti_join(d_long_remove, by="reg_visit_id") %>% 
# REDO THE APPT TALLY AND REG_VISIT_ID now that the dupes have been removed
  arrange(reg_id,event_date) %>% #make a running tally for each patient visits
  group_by(reg_id) %>%
  mutate("one"=1, "appt_tally"=cumsum(one)) %>% 
  select(-one) %>% ungroup() %>% 
  mutate(reg_visit_id=paste0(reg_id,"_",appt_tally)) %>% 
  group_by(reg_id) %>% 
  add_count(name="total_patient_visits") %>% # counter for total patients visits
  mutate(prev_event_date = lag(event_date, n=1, order_by=event_date)) %>% 
  mutate(lead_event_date = lead(event_date, n=1, order_by=event_date)) %>% 
  ungroup() #and need to add new lead lag dates

Description

Table 1 Zero Followup Visits After Baseline

Process it for the Table1 all registrations up to April 1 2024.

Thus all patients have at least 90 days for a follow up Visit after Baseline Visit.

“>=1 Visits Post Baseline” here means at least 1 visit occurs later. The second visit can be within or after 90 days from baseline.

Code
dhis_subset <- tots %>% 
    filter(registration_date >= dmy("01-01-2021")) %>% 
    filter(registration_date <= dmy("01-04-2024"))

dhis_table <- dhis_subset %>%  
    mutate(patient_returned=if_else(total_patient_visits>1, 
                                    ">=1 Visits Post Baseline",
                                    "0 Visits Post Baseline"
                                    )) %>% 
    mutate(bp_systole_mmhg=if_else(bp_systole_mmhg>400, 400, bp_systole_mmhg)) %>% 
    mutate(sbp_level =
      case_when(
        bp_systole_mmhg >= 400 |  bp_systole_mmhg < 50 ~ "error (>300 or < 50)",
        bp_systole_mmhg >= 180 & bp_systole_mmhg < 400  ~ "emergency [180-300)",
        bp_systole_mmhg >= 140 & bp_systole_mmhg < 180  ~ "very high [140-180)",
        bp_systole_mmhg >= 120 & bp_systole_mmhg < 140 ~ "high or elevated (120-140)",
        is.na(bp_systole_mmhg) ~ NA,
        TRUE ~ "normal")
    ) %>% 
  mutate(sbp_level=fct_relevel(sbp_level, 
                              c("normal","high or elevated (120-140)",
                               "very high [140-180)","emergency [180-300)", 
                               "error (>300 or < 50)"))) %>% 
  mutate(has_diabetes=as.factor(does_patient_have_diabetes)) %>%
  mutate(has_phone=as.factor(has_phone)) %>%
  mutate(has_supporter=as.factor(has_supporter)) %>%
  mutate(age_group = as.factor(cut(age,
                         breaks = c(0, 30, 45, 60, 75, 110, 100000),
                         include.lowest = T,
                         labels = c("Other", "30-44","45-59", 
                                    "60-74", "75-110", "Other")))) %>%
  mutate(age_group=fct_shift(age_group, n= 1)) %>%
  select(patient_returned, State, sex, age, age_group, appt_tally,
         "sbp_baseline"=sbp_level, bp_systole_mmhg,
         "previous_htn_treated"=hypertension_treated_in_the_past,
         "amlodipine_baseline"=med_amlodipine,
         has_phone, has_supporter, has_diabetes,days_of_medication,
         total_patient_visits) %>% 
  filter(appt_tally==1)


## Write it to RDS for the manuscript
write_rds(dhis_table, file=here::here("data","dhis_table.rds"))


tab1<-table1::table1(~ sex + State + age + age_group + previous_htn_treated + 
                 sbp_baseline + bp_systole_mmhg + amlodipine_baseline + has_phone + 
                 has_supporter + has_diabetes +days_of_medication + total_patient_visits
                 | factor(patient_returned),
               data=dhis_table)

# t1flex(tab1)
tab1
>=1 Visits Post Baseline
(N=21183)
0 Visits Post Baseline
(N=7044)
Overall
(N=28227)
sex
Female 16005 (75.6%) 5032 (71.4%) 21037 (74.5%)
Male 5171 (24.4%) 2008 (28.5%) 7179 (25.4%)
Missing 7 (0.0%) 4 (0.1%) 11 (0.0%)
State
kn Kano State 13042 (61.6%) 2910 (41.3%) 15952 (56.5%)
og Ogun State 8141 (38.4%) 4134 (58.7%) 12275 (43.5%)
age
Mean (SD) 53.7 (14.6) 53.9 (15.6) 53.8 (14.8)
Median [Min, Max] 53.0 [18.0, 124] 53.0 [18.0, 122] 53.0 [18.0, 124]
age_group
30-44 5111 (24.1%) 1682 (23.9%) 6793 (24.1%)
45-59 8738 (41.3%) 2926 (41.5%) 11664 (41.3%)
60-74 4755 (22.4%) 1494 (21.2%) 6249 (22.1%)
75-110 1253 (5.9%) 451 (6.4%) 1704 (6.0%)
Other 1326 (6.3%) 491 (7.0%) 1817 (6.4%)
previous_htn_treated
First time 17400 (82.1%) 5632 (80.0%) 23032 (81.6%)
Was treated in the past 3632 (17.1%) 1323 (18.8%) 4955 (17.6%)
Missing 151 (0.7%) 89 (1.3%) 240 (0.9%)
sbp_baseline
normal 182 (0.9%) 43 (0.6%) 225 (0.8%)
high or elevated (120-140) 1293 (6.1%) 302 (4.3%) 1595 (5.7%)
very high [140-180) 17547 (82.8%) 5701 (80.9%) 23248 (82.4%)
emergency [180-300) 2082 (9.8%) 951 (13.5%) 3033 (10.7%)
error (>300 or < 50) 13 (0.1%) 5 (0.1%) 18 (0.1%)
Missing 66 (0.3%) 42 (0.6%) 108 (0.4%)
bp_systole_mmhg
Mean (SD) 158 (17.9) 161 (18.8) 159 (18.1)
Median [Min, Max] 158 [14.0, 400] 160 [14.0, 400] 159 [14.0, 400]
Missing 66 (0.3%) 42 (0.6%) 108 (0.4%)
amlodipine_baseline
Amlodipine 10mg OD 1802 (8.5%) 802 (11.4%) 2604 (9.2%)
Amlodipine 5mg OD 19175 (90.5%) 6158 (87.4%) 25333 (89.7%)
Other drug 3 (0.0%) 0 (0%) 3 (0.0%)
Missing 203 (1.0%) 84 (1.2%) 287 (1.0%)
has_phone
0 3994 (18.9%) 1359 (19.3%) 5353 (19.0%)
1 17189 (81.1%) 5685 (80.7%) 22874 (81.0%)
has_supporter
0 4217 (19.9%) 1593 (22.6%) 5810 (20.6%)
1 16966 (80.1%) 5451 (77.4%) 22417 (79.4%)
has_diabetes
No 19050 (89.9%) 6253 (88.8%) 25303 (89.6%)
Yes 461 (2.2%) 142 (2.0%) 603 (2.1%)
Missing 1672 (7.9%) 649 (9.2%) 2321 (8.2%)
days_of_medication
30 days 2395 (11.3%) 1021 (14.5%) 3416 (12.1%)
60 days 27 (0.1%) 14 (0.2%) 41 (0.1%)
90 days 0 (0%) 1 (0.0%) 1 (0.0%)
Missing 18761 (88.6%) 6008 (85.3%) 24769 (87.7%)
total_patient_visits
Mean (SD) 6.87 (5.72) 1.00 (0) 5.40 (5.57)
Median [Min, Max] 5.00 [2.00, 45.0] 1.00 [1.00, 1.00] 3.00 [1.00, 45.0]

Count of patients returned after first visit.

Code
dhis_table %>% 
  count(patient_returned) %>% 
  mutate(pt=n/sum(n))
# A tibble: 2 × 3
  patient_returned             n    pt
  <chr>                    <int> <dbl>
1 0 Visits Post Baseline    7044 0.250
2 >=1 Visits Post Baseline 21183 0.750

Visits per patient

Explore the data. Start by counting the number of visits per patient.

This shows patients registered before a given date.

Code
tots_small <-tots %>% 
  filter(registration_date >= ymd("2021-01-01")) %>%
  select(reg_id, total_patient_visits) %>% 
  unique()


seq_visits<-seq(ymd('2021-01-01'),ymd('2024-01-01'), by = '6 months')


seq_func<-function(vec){
  tots %>% 
  ungroup() %>% 
  filter(registration_date <= ymd(vec)) %>%
  select(reg_id, total_patient_visits) %>% 
  unique() %>% 
  count(total_patient_visits) %>% 
  ggplot() +
  geom_col(aes(y=n, x=total_patient_visits)) +
    labs(title="Patient visits by cohort",
      subtitle=str_wrap(paste0("Registered Before ", 
                               as.character(vec)),20))
}

patchwork::wrap_plots(lapply(seq_visits, seq_func))                  

Code
df<-data.frame(x=seq(1:40))
df$y<-8100*df$x^-1



tots_small %>%
  count(total_patient_visits) %>% 
  left_join(df, by=c("total_patient_visits"="x")) %>% 
  ggplot(aes(x=total_patient_visits)) +
  geom_col(aes(y=n), alpha=0.6) +
  geom_line(aes(y=y), color="blue", linetype=3, linewidth=0.8) +
  scale_x_continuous(limits=c(0,35)) +
  labs(title="Patients by total visit count",
       subtitle="NHCI Registrations Jan 1 2021 - Jul 1 2024",
       x="Visit count",
       y="Total patients") +
  annotate("rect", xmin=17, xmax=32, ymin=2000, ymax=4000, color="blue",fill="white", linetype=2) +
  annotate("text", x=25, y=3500, label="Decay Function: y ~ a * x^ -1", size=5, color="blue") +
  annotate("text", x=25, y=2500, label="where a = Count with 1 Visit", size=5, color="blue") +
  expand_limits(x = 0, y = 0) +
    coord_cartesian(expand = FALSE, clip = "off")

Code
my_ggsave("fig1_visitCounts",12)
Code
count(tots_small, total_patient_visits) %>% 
  ungroup() %>% 
  mutate(percent=round(100*(n/sum(n)),4)) %>% DT::datatable(rownames = FALSE)
Code
tots_small %>% 
      mutate(count_level =
        case_when(
          total_patient_visits==1 ~ "01 visit",
          total_patient_visits>=2 & total_patient_visits < 4  ~ "02-03 visits",
          total_patient_visits>=4 & total_patient_visits < 11  ~ "04-10 visits",
          total_patient_visits>=4 & total_patient_visits < 11  ~ "04-10 visits",
          total_patient_visits>=11 & total_patient_visits < 46  ~ "11-45 visits",
          TRUE ~ NA
        )
      ) %>% 
    count(count_level) %>% 
    mutate(percent=100*(n/sum(n))) %>% kable() %>% kableExtra::kable_styling()
count_level n percent
01 visit 8105 27.31992
02-03 visits 8022 27.04015
04-10 visits 9346 31.50302
11-45 visits 4194 14.13692
Code
#72816/126422
## top 20% of attendees by visit count, attended 58% of all visits.

Investigate NCD Patient Status variable (patient follow up tracker..)

Code
# tots %>% 
#   distinct(reg_id, ncd_patient_status) %>% 
#   count(ncd_patient_status)

tots %>% 
  count(ncd_patient_status) %>% 
  mutate(pct=round(n/sum(n),4)*100) %>% kable() %>% kableExtra::kable_styling()
ncd_patient_status n pct
Active record 162166 99.07
Died 457 0.28
Inactive record 433 0.26
Transferred 569 0.35
Unresponsive 69 0.04
Code
# length(unique(tots$reg_id))

# 1100/30704

Narrow in on registrations after jan 1 2023, can see a dropoff of attendance by sex.

Code
tots %>% 
  ungroup %>% 
  arrange(desc(registration_date)) %>% 
  filter(registration_date>=ymd("2023-01-01") &
         registration_date<ymd("2024-01-01") ) %>% 
  filter(!is.na(sex)) %>% 
    # mutate(n_group =
    #   case_when(
    #     total_patient_visits==1 ~ "01 (Entry)",
    #     total_patient_visits==2 ~ "02",
    #     total_patient_visits >1 & total_patient_visits<=5 ~ "03-06",
    #     total_patient_visits >5 & total_patient_visits<=11 ~ "07-11",
    #     TRUE ~ "12+"
    #   )
    # ) %>% 
  ungroup() %>% 
  filter(total_patient_visits==appt_tally & appt_tally<25) %>% 
  group_by(sex, appt_tally) %>% 
  summarise(count_by_ngroup=n()) %>%  
  mutate(percent= prop.table(count_by_ngroup)*100) %>% 
  mutate(cumulative_percent=cumsum(percent)) %>% 
  ungroup() %>% 
  as_tibble() %>% 
  ggplot()+
  geom_line(aes(x=appt_tally,y=percent, color=sex)) +
  # geom_col(aes(x=appt_tally,y=percent, fill=sex), position="dodge") +
  # facet_grid(~sex, scales="fixed") +
  labs(title="Patients by Sex and Total Number of Visits",
       subtitle="Registrations in 2023 counting All Visits to April 1, 2024",
       x="Total Number of Patient Visits in Time Period")+
  # theme_classic()+
  theme(legend.position = "bottom") +
  scale_x_continuous(breaks = seq(1, 22, by = 1))

Code
tots %>% count(LGA) %>% nrow()
[1] 42

Monthly patient load per clinic

Code
dhis_pts_per_month<-tots %>% 
  mutate(appt_month=lubridate::month(event_date)) %>% # transform to month
  group_by(appt_month, organisation_unit) %>% 
  summarize(patients=n_distinct(reg_id),
            appts=n()) %>% 
  ungroup() %>% 
  mutate(appt_patient_ratio=appts/patients)

dhis_pts_per_month %>% 
  summarytools::descr()
Descriptive Statistics  
dhis_pts_per_month  
N: 1242  

                    appt_month   appt_patient_ratio     appts   patients
----------------- ------------ -------------------- --------- ----------
             Mean         6.49                 1.21    131.80     103.45
          Std.Dev         3.45                 0.16    183.67     136.95
              Min         1.00                 1.00      1.00       1.00
               Q1         3.00                 1.10     41.00      35.00
           Median         6.00                 1.18     74.00      61.00
               Q3         9.00                 1.29    131.00     108.00
              Max        12.00                 2.08   1287.00     990.00
              MAD         4.45                 0.13     57.82      45.96
              IQR         5.75                 0.19     90.00      72.75
               CV         0.53                 0.13      1.39       1.32
         Skewness         0.00                 1.40      3.33       3.46
      SE.Skewness         0.07                 0.07      0.07       0.07
         Kurtosis        -1.22                 3.24     12.12      13.64
          N.Valid      1242.00              1242.00   1242.00    1242.00
        Pct.Valid       100.00               100.00    100.00     100.00
Code
# %>% summarize(patient_avg=mean(patients),
#             appt_avg=mean(appts))

dhis_pts_per_month %>%
  ggplot()+
  geom_histogram(aes(x=appts),bins=30) +
  labs(title="Total appointments per clinic per month")

Days & months enrolled per patient

Days enrolled per patient = Days since entry at latest event + 90

Code
dpv<-tots %>% 
  select(reg_id, appt_tally, total_patient_visits, registration_date, 
         event_date, days_since_entry) %>% 
  filter(appt_tally==total_patient_visits & days_since_entry>=0) %>% 
  mutate(days_tracked=days_since_entry+90) %>% 
  mutate(patient_days_per_visit=days_tracked/total_patient_visits)

summary(dpv$patient_days_per_visit)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   5.294   58.000   90.000   94.394   90.600 1370.000 
Code
# tots %>% 
#   filter(total_patient_visits>40) %>% View()

dpv %>% 
  mutate(days_enrolled=if_else(total_patient_visits==appt_tally, 
                               event_date-registration_date, NA)) %>% 
  filter(!is.na(days_enrolled) & total_patient_visits>1) %>%
  # filter(patient_days_per_visit>=0 & patient_days_per_visit< 91) %>% 
  mutate(months_per_visit=patient_days_per_visit/30) %>% 
  ggplot() +
  geom_histogram(aes(x=months_per_visit)) +
  labs(title="Months Per Visit, By Patient",
       subtitle="Total number of months enrolled divided by number of visits")

Cumulative density of days since Last visit

Code
tots %>% 
  filter(appt_tally!=1) %>% 
  mutate(days_since_last=as.integer(event_date-prev_event_date)) %>% 
  filter(days_since_last<50) %>%
  # count(days_since_last) %>% 
  ggplot(aes(x=days_since_last)) + 
  geom_step(aes(y=after_stat(y)),stat="ecdf") +
  labs(title="Cumulative Density of Events by days since latest visit",
       subtitle = "All visits occuring less than 50 days from last")

Code
# 
#   ggplot() +
#   geom_histogram(aes(x=days_since_last))
Code
tots %>% 
    filter(appt_tally>1) %>%
      mutate(days_since_last=event_date-prev_event_date) %>% 
  # filter(days_since_last>7) %>% 
      mutate(days_since_level =
        case_when(
          days_since_last ==0 ~ "_Same Day",
          days_since_last>=1 & days_since_last < 8  ~ "01-07 Days",
          days_since_last>=8 & days_since_last < 21  ~ "08-20 Days",
          days_since_last>=21 & days_since_last < 27  ~ "21-26 Days",
          days_since_last>=27 & days_since_last < 37  ~ "27-36 Days",
          days_since_last>=37 & days_since_last < 90  ~ "37-89 Days",
          days_since_last>=90  ~ "90+ Days",
          TRUE ~ "_Same Day"
        )
      ) %>% 
    count(days_since_level) %>% 
    mutate(percent=100*(n/sum(n))) %>% 
    ggplot(aes(x=days_since_level,y=percent))+
    geom_col() +
    
    # geom_text(aes(label = round(percent,2)), color="white") +
    labs(title="Follow Up Visits: Days After Previous Visit",
         subtitle="Visits 2021-2024, Excluding Duplicate Visits",
         x="",y="Percent of Follow Up visits") +coord_flip()

Define the outcome event: Attrition

  • No return visit after entry visit

  • No return within 1-90 days of a visit (Loose Window)

  • No return within 1-37 days of any visit (Strict Window)

  • No return within 365 days of latest visit (Original Window)

Patient status

Here we check what patient statuses follow the latest visit.

Code
tots %>% 
  filter(appt_tally==total_patient_visits) %>%
  count(patient_followup_status, ncd_patient_status)
# A tibble: 19 × 3
   patient_followup_status ncd_patient_status     n
   <chr>                   <chr>              <int>
 1 Called                  Active record        143
 2 Called                  Died                   2
 3 Called                  Inactive record        9
 4 Called                  Transferred            4
 5 Overdue Pending Call    Active record      24293
 6 Overdue Pending Call    Died                  72
 7 Overdue Pending Call    Inactive record      139
 8 Overdue Pending Call    Transferred          116
 9 Overdue Pending Call    Unresponsive          20
10 Upcoming visit          Active record       2535
11 Upcoming visit          Died                   4
12 Upcoming visit          Inactive record        2
13 Upcoming visit          Transferred            5
14 Upcoming visit          Unresponsive           2
15 <NA>                    Active record       3323
16 <NA>                    Died                  12
17 <NA>                    Inactive record       11
18 <NA>                    Transferred           17
19 <NA>                    Unresponsive           3

How to interpret CANCELLED or COMPLETED registrations remains a question.

Scheduled and Actual Visits

Days between actual visit date and scheduled date

Code
tots %>% 
  ungroup() %>%
  filter(appt_tally!=1) %>%
  mutate(days_from_sched=as.numeric(event_date-scheduled_date)) %>% 
    mutate(days_from_sched_range =
      case_when(
        days_from_sched < 0 ~ "Earlier_than_scheduled",
        days_from_sched == 0 ~ "0_Same_Day",
        days_from_sched >= 1 & days_from_sched  <= 7 ~ "1.0_Day_to_1_Week",
        days_from_sched >= 8 & days_from_sched  <= 28 ~ "1.1_to_4_Weeks",
        days_from_sched >= 29 & days_from_sched  <= 8*12 ~ "4.1_to_12_Weeks",
        TRUE ~ "Other"
      )
    ) %>% 
  # count(days_from_sched_range) %>% 
  janitor::tabyl(days_from_sched_range) %>% 
  kable() %>% kableExtra::kable_styling()
days_from_sched_range n percent
0_Same_Day 45094 0.3390985
1.0_Day_to_1_Week 12627 0.0949527
1.1_to_4_Weeks 11587 0.0871321
4.1_to_12_Weeks 11844 0.0890647
Earlier_than_scheduled 43740 0.3289167
Other 8090 0.0608353

Most visits are on the same day as they were scheduled (spontaneous visits?) or earlier.

Code
tots %>% 
  ungroup() %>%
  filter(appt_tally!=1) %>%
  mutate(days_from_latest_schedDate=as.numeric(scheduled_date-prev_event_date)) %>% 
  mutate(days_from_latest_eventDate=as.integer(event_date-prev_event_date)) %>% 
  filter(days_from_latest_eventDate<60 & days_from_latest_eventDate>0) %>%
    filter(days_from_latest_schedDate<60& days_from_latest_schedDate>0) %>% 
  select(reg_visit_id, starts_with("days_from")) %>% 
  pivot_longer(cols=starts_with("days_from"), names_to="property", values_to="value") %>% 
ggplot(aes(x=value, fill=property)) +
  # facet_wrap(~property) +
  geom_histogram(position="dodge",alpha=0.4, bins=50) +
  labs(title="Histogram of Follow Up visits",
  subtitle=str_wrap("by days between scheduled visit date and previous visit date, plus days between current visit date and previous visit date", 90),
       caption="Removing values < 0 or > 60") +
  theme(legend.position = "bottom")

Code
tots %>% 
  ungroup() %>%
  filter(appt_tally!=1) %>%
  mutate(days_from_latest_schedDate=as.numeric(scheduled_date-prev_event_date)) %>% 
  mutate(days_from_latest_eventDate=as.integer(event_date-prev_event_date)) %>% 
  # filter(days_from_latest_eventDate<90 & days_from_latest_eventDate>0) %>%
    filter(days_from_latest_schedDate>0) %>%
  ggplot() + 
  geom_step(aes(x=days_from_latest_eventDate,y=after_stat(y), color="Event Date"),
            stat="ecdf", linewidth=1, alpha=0.6) +
  geom_step(aes(x=days_from_latest_schedDate,y=after_stat(y), color="Scheduled Date"),
            stat="ecdf", linetype=3, linewidth=1) +
  labs(title="Followup visit after previous: by event date and scheduled date",
       subtitle="Cumulative incidence of followup visits by days from previous",
       x="Days since previous visit",
       y="") +
      scale_color_manual(name = "Visit feature",
                         values = c("Event Date" = "black", #match the name of the element to the name in the aes
                                    "Scheduled Date" = "blue")) +
  coord_cartesian(xlim = c(0, 110), ylim=c(0,1)) +
  scale_x_continuous(breaks = seq(0, 105, by = 15)) +
  theme(legend.position ="bottom")

Code
my_ggsave("fig2_incidence_after_previous",10)

53% of followup visits are scheduled to occur 27-36 days after the previous visit, with 38% on precisely 28 days later (the system default scheduled date). 44% of followup visits actually occur within that window.

NEXT STEPS:

  • Create the windows for timeliness for each subsequent visit

  • Per visit-opportunity analysis of timeliness

Timely Windows

Code
tots_timely<-tots %>% 
  ungroup() %>% 
  filter(appt_tally==1 | event_date-prev_event_date>=7) %>% # remove visits within 7 days of last
  mutate(nextdate_37=event_date+37, # create 37 and 90 day windows
         nextdate_90=event_date+90,
         nextdate_37_kept=if_else(lead_event_date<=nextdate_37, 1, 0),
         nextdate_90_kept=if_else(lead_event_date<=nextdate_90, 1, 0)) %>% 
  mutate(across(contains("kept"), ~replace(., is.na(.), 0))) %>% 
  select(sex, State,appt_tally, total_patient_visits, ncd_patient_status,
         contains("reg"),contains("event_date"), contains("nextdate"), everything())

Adding month and year variables to the expected window for subsequent visits, and checking whether patient attended their next visit within that window.

Code
tots_timely<-tots_timely %>% 
  mutate(exp_year=lubridate::year(nextdate_90)) %>% 
  mutate(exp_month=lubridate::month(nextdate_90)) %>% 
  mutate(exp_year_month=make_date(year=exp_year, month=exp_month, day="01")) %>% 
  mutate(days_since_last=as.integer(event_date-prev_event_date))

Outcome Events and Censoring Events

This next part is crucial.

The FIRST time a patient exceeds 90 days before the next visit, they are LTFU at that 90 day mark.

This is the OUTCOME EVENT. Subsequent visits for that patient would be discarded.

IFF a patient is marked TRANSFERRED or DEATH, then they are CENSORED after last visit. IFF the last expected visit is after July 1, then they are CENSORED after last visit.

Code
tots_timely<-tots_timely %>% 
  arrange(reg_id, appt_tally) %>% 
  mutate(nextdate_90_dropout=if_else(nextdate_90_kept==0, 1, NA)) %>% 
  group_by(reg_id) %>% 
  fill(nextdate_90_dropout, .direction="down") %>% 
  mutate(nextdate_90_dropout=replace_na(nextdate_90_dropout, 0)) %>% 
  mutate(visits_since_dropout=cumsum(nextdate_90_dropout)) %>% # visit before dropout==1
  ungroup() %>% 
  mutate(censor =
    case_when(
      appt_tally==total_patient_visits & ncd_patient_status %in% c("Transferred","Died") ~ 1,
      nextdate_90 > ymd("2024-07-01") ~ 1,
      TRUE ~ 0
    )
  )

tots_timely %>% 
  select(reg_id, event_date, days_since_last, appt_tally, visits_since_dropout,
         contains("nextdate_90"), censor) %>% tail(15)
# A tibble: 15 × 9
   reg_id event_date days_since_last appt_tally visits_since_dropout nextdate_90
    <dbl> <date>               <int>      <dbl>                <dbl> <date>     
 1  32329 2022-10-12              21          6                    0 2023-01-10 
 2  32329 2022-11-17              36          7                    0 2023-02-15 
 3  32329 2023-01-15              59          8                    0 2023-04-15 
 4  32329 2023-03-20              64          9                    1 2023-06-18 
 5  32329 2023-08-23             156         10                    2 2023-11-21 
 6  32329 2024-02-19             180         11                    3 2024-05-19 
 7  32331 2023-02-09              NA          1                    0 2023-05-10 
 8  32331 2023-03-13              32          2                    0 2023-06-11 
 9  32331 2023-05-31              79          3                    0 2023-08-29 
10  32331 2023-07-17              47          4                    0 2023-10-15 
11  32331 2023-08-21              35          5                    0 2023-11-19 
12  32331 2023-09-25              35          6                    0 2023-12-24 
13  32331 2023-12-18              84          7                    1 2024-03-17 
14  32332 2022-03-31              NA          1                    1 2022-06-29 
15  32332 2022-09-29             182          2                    2 2022-12-28 
# ℹ 3 more variables: nextdate_90_kept <dbl>, nextdate_90_dropout <dbl>,
#   censor <dbl>
Code
tots_timely %>% 
  count(censor) %>% kable() %>% kableExtra::kable_styling()
censor n
0 150019
1 12319
Code
# 12292/(149856+12292)

# tots_timely %>% 
#   filter(visits_since_dropout < 2) %>%
#   distinct(reg_id)

We can now calculate the RATES of attrition.

Attrition Rate charts

This removes visits that are censored or after dropout.

Code
tots_timely_rates_prep<-tots_timely %>% 
   filter(censor==0) %>% 
   filter(!is.na(sex)) %>% 
   filter(visits_since_dropout<2) 
#dropout occurs only once.... and if censored, not included in denom


find_dropout_rates<-function(data, ...){
  data %>% 
   group_by(exp_year_month, ...) %>% 
   count(nextdate_90_kept) %>% 
   mutate(rate= n/sum(n)) %>% 
   ungroup()
}

tots_timely_rates_all<-tots_timely_rates_prep %>% 
  find_dropout_rates()

tots_timely_rates_all %>% 
  group_by(nextdate_90_kept) %>% 
  summarise(avg_monthly_rate=mean(rate)) %>% kable() %>% kableExtra::kable_styling()
nextdate_90_kept avg_monthly_rate
0 0.2619861
1 0.7380139

The average monthly INCIDENCE RATE of attrition events (dropouts) is 26%

Now further breakdown of dropout incidence by categories

Code
tots_timely_rates_ss<-tots_timely_rates_prep %>% 
    find_dropout_rates(sex, State)

tots_timely_rates_appt_tally<-tots_timely_rates_prep %>% 
  filter(appt_tally<6) %>% 
  mutate(appt_tally=as.factor(appt_tally)) %>% 
  find_dropout_rates(appt_tally)

tots_timely_rates_state_appt_tally<-tots_timely_rates_prep %>% 
  filter(appt_tally<5) %>% 
  mutate(appt_tally=as.factor(appt_tally)) %>% 
  find_dropout_rates(appt_tally, State)

tots_timely_rates_hc <-tots_timely_rates_prep %>% 
    add_count(exp_year_month, LGA, HC) %>% 
    filter(n>9) %>% 
    find_dropout_rates(State, LGA, HC)

tots_timely_rates_lga <-tots_timely_rates_prep %>% 
    add_count(exp_year_month, LGA) %>% 
    filter(n>9) %>% 
    find_dropout_rates(State, LGA)
Code
ggcoeff<-10000
ggbarscol<-RColorBrewer::brewer.pal(4,"Dark2")[3]
gglinecol<-RColorBrewer::brewer.pal(4,"Dark2")[1]

attr_90day_withRegs<-tots_timely_rates_prep %>% 
  filter(registration_date<=ymd("2024-06-30")) %>% 
  # filter(registration_date>=ymd("2021-04-01")) %>%
  # filter(event_date>=ymd("2021-04-01")) %>%
  filter(exp_year_month<=ymd("2024-06-30")) %>% 
  mutate(reg_year=lubridate::year(registration_date)) %>% 
  mutate(reg_month=lubridate::month(registration_date)) %>% 
  mutate(reg_year_month=make_date(year=exp_year, 
                                      month=exp_month, day="01")) %>% 
  add_count(reg_year_month,name="new_registrations") %>% 
  find_dropout_rates(reg_year_month, new_registrations) %>% 
  filter(nextdate_90_kept==0) %>% 
  ggplot() +
  geom_col(aes(x=reg_year_month, y=new_registrations/ggcoeff),
           fill=ggbarscol, alpha=0.5) +
  geom_line(aes(x=exp_year_month, y=rate), col=gglinecol,stat="identity") +
  labs(title="90-day attrition rate with monthly registrations",
       subtitle="Attrition rate is by month at end of 90-day return window") +
  labs(x="") +
  # theme_classic()+
  theme(legend.position = "bottom",
        axis.title.y.left = element_text(color=gglinecol),
        axis.title.y.right = element_text(color=ggbarscol)) +
  scale_y_continuous(
    name="Attrition Rate",
    labels=scales::label_percent(),
    sec.axis = sec_axis(~.*ggcoeff, name="Monthly Registrations")) +
 scale_x_date(breaks = scales::breaks_pretty(12), labels = scales::label_date_short())


attr_90day_withRegs

Remember responses from RTSL here:

  • When were lists of LTFU for outreach generated? We generated the first list of LTFU for facilities to track patients in April 2021 using the 2021 Q1 NHCI Excel calculator data.

  • When was the dashboard deployed in both states? March 2022

  • When were legacy data imported? February 2022

  • When was the DHIS2-only entry pilot started, and was it expanded or ended? This commenced in August 2023 and ended in December 2023. The pilot just concluded and a report is being written by the consultants. This report would determine how and when we would expand this across the entire program.

Code
attr_90day_withRegs +
  annotate("rect", xmin = as.Date("2021-04-01"), 
                    xmax = as.Date("2021-09-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
  annotate("rect", xmin = as.Date("2021-12-21"), 
                    xmax = as.Date("2022-05-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
  annotate("rect", xmin = as.Date("2022-06-01"), 
                    xmax = as.Date("2023-04-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
  annotate("rect", xmin = as.Date("2023-07-15"), 
                    xmax = as.Date("2024-02-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
#   geom_text(data = data.frame(x = as.Date("2021-06-01"),
# y = 0.45, label = "1st LTFU List"), mapping = aes(x = x, y = y, label = label),
# size = 2.17, hjust = 0.45, inherit.aes = FALSE)
  annotate("text", x = as.Date("2021-06-15"), y = 0.44,
                  label = "1st LTFU Lists", size=3.3) +
    annotate("text", x = as.Date("2022-02-25"), y = 0.44,
                  label = "Legacy data", size=3.3) +
    annotate("text", x = as.Date("2022-11-01"), y = 0.44,
                  label = "Growth and Scale", size=3.5) +
    annotate("text", x = as.Date("2023-10-15"), y = 0.44,
                  label = "Naira crisis", size=3.3)

Code
my_ggsave("fig3_attritionRegs_full", 12)

Seasonal trends? Seems to be a dip in spring/summer

Code
tots_timely_rates_prep %>%
  find_dropout_rates(exp_month, exp_year) %>% 
  filter(nextdate_90_kept==0) %>% 
    mutate(exp_month=as.integer(exp_month)) %>% 
    mutate(exp_year=as.factor(exp_year)) %>% 
  ggplot()+
  geom_line(aes(x=exp_month, y=rate, 
                color=exp_year, group=exp_year, linetype=exp_year)) +
  labs(title="90-day attrition rate by year") +
  # scale_x_continuous(labels=scales::label_number(accuracy=1)) +
  scale_y_continuous(labels=scales::label_percent()) +
  scale_x_continuous(labels = scales::label_number(accuracy = 1))+
  labs(x="End month of 90-day attrition window",y="Attrition Rate") +
  # theme_classic()+
  theme(legend.position = "bottom") 

By sex and state

Code
tots_timely_rates_ss %>% 
  filter(nextdate_90_kept==0) %>%
  ggplot()+
  geom_line(aes(x=exp_year_month, y=rate, color=sex)) +
  labs(title="90-day attrition rate by month") +
  scale_x_date() +
  facet_grid(~State) +
    scale_y_continuous(labels=scales::label_percent()) +
  labs(x="Year&Month of Expected Visit",y="Attrition Rate") +
  # theme_classic()+
  theme(legend.position = "bottom")

Code
# ggsave("attr_90day.jpeg", bg = "transparent", width = 20, height = 10, units = 'cm')

By state and appt visit

Code
tots_timely_rates_state_appt_tally %>% 
  filter(nextdate_90_kept==0) %>%
  ggplot()+
  geom_line(aes(x=exp_year_month, y=rate, color=appt_tally)) +
  labs(title="90-day attrition rate by month") +
  scale_x_date() +
  facet_grid(~State) +
  labs(x="Year&Month of Expected Visit",
       y="Attrition Rate", color="Latest Visit Count") +
  # theme_classic()+
    scale_y_continuous(labels=scales::label_percent()) +
  theme(legend.position = "bottom") +
  scale_y_continuous(limits = c(0, 0.75))

Code
# ggsave("attr_90day.jpeg", bg = "transparent", width = 20, height = 10, units = 'cm')

Attrition by latest visit number of patient

Code
tots_timely_rates_appt_tally %>% 
  filter(nextdate_90_kept==0) %>%
  ggplot()+
  geom_line(aes(x=exp_year_month, y=rate, group=appt_tally, color=appt_tally)) +
  scale_y_continuous(labels=scales::label_percent()) +
  labs(title="90-day attrition rate by month", y= "Attr Rate")

Code
tots_timely_rates_lga %>% 
  filter(nextdate_90_kept==0) %>%
  mutate(LGA=str_remove_all(LGA, " Local Government Area")) %>% 
  mutate(LGA=fct_reorder(LGA, rate, .fun = mean)) %>% 
  ggplot(aes(x = LGA, y = rate, color=State)) +
  geom_boxplot() +
  labs(title="90-day monthly attrition rates by LGA", 
       y= "Attrition Rate",
       caption="Removed all healthCentre-months pairs with less than 10 expected visits") +
    facet_wrap(~State, ncol=1, scales="free_y") +
    scale_y_continuous(labels=scales::label_percent()) +
  coord_flip() +
  # theme_classic() 
  theme(legend.position = "null")

Monthly HC attrition rates

Code
tots_timely_rates_hc %>% 
  filter(nextdate_90_kept==0) %>%
  mutate(LGA=str_remove_all(LGA, " Local Government Area")) %>% 
  mutate(LGA=fct_reorder(LGA, rate, .fun = mean)) %>% 
  ggplot(aes(x = LGA, y = rate, color=State)) +
  geom_boxplot() +
  labs(title="90-day monthly attrition rates at Health Centre Level",
       subtitle="Grouped by LGA and State of Health Centre",
       caption="Removed all healthCentre-months pairs with less than 10 expected visits",
       y= "Attrition Rate") +
    scale_y_continuous(labels=scales::label_percent()) +
    facet_wrap(~State, ncol=1, scales="free_y") +
  coord_flip() +
  # theme_classic() +
  theme(legend.position = "null")

Mean and SD

Code
tots_timely_rates_hc %>% 
  filter(nextdate_90_kept==0) %>%
  group_by(State) %>% 
  summarise(mean=mean(rate),
         sd=sd(rate)) %>% 
  kable() %>% kableExtra::kable_styling()
State mean sd
kn Kano State 0.3293533 0.2270595
og Ogun State 0.3257438 0.1767288

Repeat Dropout Patients?

Just to test, what’s the OR for experiencing a dropout for patients who have already experienced one, compared to those who have not?

Code
tots_timely %>% 
  select(reg_id, appt_tally, visits_since_dropout, nextdate_90_kept) %>% 
  mutate(nextdate_90_notMet=if_else(nextdate_90_kept==1,0,1)) %>% 
  mutate(post_dropout=if_else(visits_since_dropout>1, 1, 0)) %>% 
  janitor::tabyl(post_dropout,nextdate_90_notMet) %>% 
  janitor::adorn_title()
              nextdate_90_notMet      
 post_dropout                  0     1
            0              80519 30573
            1              29441 21805
Code
(21858*79526) / (30560*29274)
[1] 1.943051

The odds of dropout at each appointment are 1.94 fold for patients who have previously been delayed more than 90 days and later returned, compared to those who never experienced a first dropout event.

Proper Treatment Protocol?

Care should follow national guideline

Create a visit variable for each step in the care protocol, if the patient met those criteria and whether the correct meds were offered at the visit.

This could be added to the prediction model.

Code
# Should follow guideline here https://dhis2.org/nigeria-hypertension-control/

tots_timely_quality<-tots_timely %>% 
  filter(registration_date>=ymd("2021-01-01")) %>% 
    mutate(p_step1_need =
      case_when(
        appt_tally==1  & 
        ((bp_systole_mmhg >= 140 & bp_systole_mmhg < 160) | 
        (bp_diastole_mmhg >=90 & bp_diastole_mmhg < 100))
        ~ 1,
        TRUE ~ 0
      )) %>% 
    mutate(p_step2_need =
      case_when(
        appt_tally==2 & days_since_last >= 14 &
        (bp_systole_mmhg>= 140 | 
         bp_diastole_mmhg >=90)
        ~ 1,
        (appt_tally==1  & 
        ((bp_systole_mmhg >= 160 & bp_systole_mmhg < 180) | 
        (bp_diastole_mmhg >= 100 & bp_diastole_mmhg < 110)))
        ~ 1,
        TRUE ~ 0)
      ) %>% 
    mutate(p_step3_need =
      case_when(
        appt_tally==3 & days_since_last >= 14 &
        (bp_systole_mmhg>= 140 | 
         bp_diastole_mmhg >=90)
        ~ 1,
        (appt_tally==1  & 
        ((bp_systole_mmhg >= 180) | 
        (bp_diastole_mmhg >= 110)))
        ~ 1,
        TRUE ~ 0)
      ) %>% 
      mutate(p_step4_need =
      case_when(
        appt_tally==4 & days_since_last >= 14 &
        (bp_systole_mmhg>= 140 | 
         bp_diastole_mmhg >=90)
        ~ 1,
        TRUE ~ 0)
      ) %>% 
  # escalation criteria for first visit
  mutate(p_step2_need=if_else(p_step3_need==1, 0, p_step2_need)) %>% 
  mutate(p_step1_need=if_else(p_step2_need==1, 0, p_step1_need)) %>% 
  # now do the appropriate treatment for each step
    mutate(p_step1_met =
      case_when(
        p_step1_need==1  & med_amlodipine== "Amlodipine 5mg OD"
        ~ 1,
        TRUE ~ 0)
    ) %>% 
    mutate(p_step2_met =
      case_when(
        p_step2_need==1  & 
        med_amlodipine== "Amlodipine 5mg OD" &
        (med_losartan=="Losartan 50mg" | (sex=="Female" & age<45))
        ~ 1,
        TRUE ~ 0)
    ) %>% 
    mutate(p_step3_met =
      case_when(
        p_step3_need==1  & 
        med_amlodipine== "Amlodipine 10mg OD" &
        (med_losartan=="Losartan 100mg" | (sex=="Female" & age<45))
        ~ 1,
        TRUE ~ 0)
    ) %>% 
    mutate(p_step4_met =
      case_when(
        p_step4_need==1  & 
        med_amlodipine== "Amlodipine 10mg OD" & 
        med_hydrochlorothiazide=="Hydrochlorathiazide 25mg" &
        (med_losartan=="Losartan 100mg" | (sex=="Female" & age<45))
        ~ 1,
        TRUE ~ 0)
    )

r2<-function(x){round(x,3)}

tots_timely_quality %>% 
  filter(appt_tally<5) %>%
  group_by(nextdate_90_kept) %>%
  summarise(
    p_step1_rate= sum(p_step1_met)/sum(p_step1_need),
    p_step2_rate= sum(p_step2_met)/sum(p_step2_need),
    p_step3_rate= sum(p_step3_met)/sum(p_step3_need),
    p_step4_rate= sum(p_step4_met)/sum(p_step4_need)
  ) %>% 
  mutate(across(where(is.double), r2)) %>% 
  kable() %>% kableExtra::kable_styling()
nextdate_90_kept p_step1_rate p_step2_rate p_step3_rate p_step4_rate
0 0.941 0.618 0.170 0.004
1 0.957 0.615 0.136 0.002

While we’re add it, add the change of sBP and DBP since baseline.

This would be a quality indicator too.

Code
tots_timely_quality <- tots_timely_quality %>%
  mutate(bp_systole_mmhg=if_else(bp_systole_mmhg>300, 300, bp_systole_mmhg)) %>% 
  mutate(bp_diastole_mmhg=if_else(bp_diastole_mmhg>200, 200, bp_diastole_mmhg)) %>% 
  group_by(reg_id) %>% 
  mutate(sbp_baseline=if_else(appt_tally==1, bp_systole_mmhg, NA)) %>% 
  fill(sbp_baseline, .direction = "down") %>%
  mutate(dbp_baseline=if_else(appt_tally==1, bp_diastole_mmhg, NA)) %>% 
  fill(dbp_baseline, .direction = "down") %>%
  ungroup() %>% 
  mutate(sbp_change_from_baseline=bp_systole_mmhg-sbp_baseline,
         dbp_change_from_baseline=bp_diastole_mmhg-dbp_baseline)          

Weather data

Now add in the daily weather data from open-meteo.com API from 1/1/2021 to 1/7/2024.

This pulled data on max temp, max apparent temp (with humidity), rain total, and rain hours for the centroid coordinates of each LGA.

For details see companion script.

Code
# import weather data and clean the lga names

weather_lga_centroids<-read_rds(here::here("data", "weather_lga_centroids.rds")) %>% 
  ungroup() %>% 
  mutate(lganame=str_squish(lganame)) %>% 
  mutate(lganame=str_remove_all(lganame, "\\/Ota"), # fix this one name
         lganame=str_replace_all(lganame, "Nassarawa","Nasarawa"), # fix this name
         lganame=str_replace_all(lganame, "Dambatta","Danbatta"), # fix this name
         lganame=str_replace_all(lganame, "Shagamu","Sagamu")) %>% # fix this name
  rename_with(~ paste0("cli_", .), max_temp_c:last_col()) %>% 
  mutate(lga_37date=str_c(lganame,date,sep="_")) %>% 
  arrange(lga_37date) %>% 
  select(lga_37date,  starts_with("cli_"))


# clean lga names in main dataset and merge with weather
tots_timely_qual_climate<-tots_timely_quality %>% 
    mutate(LGA=str_remove_all(LGA, " Local Government Area ")) %>% 
    mutate(LGA=str_sub(LGA, start=4, end=50)) %>% 
    mutate(LGA=str_squish(LGA)) %>% 
    mutate(lga_37date=str_c(LGA,as.character(nextdate_37),sep="_")) %>% 
    arrange(lga_37date) %>% 
    left_join(weather_lga_centroids, by="lga_37date")

Market data

Now add in the weekly market data from FEWSNET API for 1/1/2021 to 1/7/2024

This is official FAO data on food and staple good prices at local markets, and is used to assess food security. The prices are weekly, but have been reformed as a 10 day rolling average.

Included here are weekly subnational price data from Kano State and Lagos State for nine staples, in Naira:

  • Diesel

  • Gasoline

  • Millet

  • Rice (Milled and Broken)

  • Maize Grain (Yellow and White)

  • Sorghum (Brown and White)

  • Yams

According to an FAO food systems profile on Nigeria (pg 5), these are all staple foods in Nigeria.

The price of oil, both globally and domestically in Nigeria, affects the entire economy, including the price of transportation, staple foods, healthcare, and in general the cost of living. Theses prices could therefore affect ability/willingness to access HTN services.

Note data were not available for Ogun State, but Lagos borders Ogun to the South, and has market data more consistent with national rates than Oyo to the north.

For details see companion script.

Code
fews<-read_rds(here::here("data", "fews_ng_prices_2021_2024.rds")) %>% 
    ungroup() %>%
    filter(date>ymd("2021-02-01")) %>% 
    rename_with(~ paste0("mkt_", .), 3:last_col()) %>% 
    mutate(admin_1=if_else(admin_1=="Lagos", "Ogun","Kano")) %>% 
      arrange(admin_1, date) %>% 
    mutate(state_37date=str_c(admin_1,date,sep="_")) %>% 
    select(state_37date, starts_with("mkt")) %>% 
    arrange(state_37date)


tots_timely_qual_cli_mkt<-tots_timely_qual_climate %>%
    # clean up state name to merge with market prices
    arrange(State, nextdate_37) %>% 
    mutate(State=str_remove_all(State, " State"),
           State=str_sub(State, 5, 50),
           State=str_squish(State)) %>% 
    mutate(state_37date=str_c(State,nextdate_37,sep="_")) %>% 
    arrange(state_37date) %>% 
    left_join(fews, by="state_37date") %>% 
    select(reg_id, reg_visit_id, registration_date, #reorder some vars 
           event_date, appt_tally, everything())

Are weather and market prices 28-37 days after latest visit different for those who attend their next one 0-37 days after that visit, and those who fail to attend?

Ttest and effect sizes might give a general idea…

Code
pop_params_dropouts_37days_prep<-tots_timely_qual_cli_mkt %>% 
   filter(censor==0) %>% 
   filter(visits_since_dropout<1) %>% 
  # removed censored obs or those that already "droppedout"
  mutate(nextdate_37_kept=as.factor(nextdate_37_kept)) %>% 
  select(State, nextdate_37_kept, ends_with("pre10day_mean")) %>% 
  rowid_to_column() %>% 
  pivot_longer(cols=ends_with("pre10day_mean"),
               names_to="var",
               values_to="value")

# boxplot
pop_params_dropouts_37days_prep %>% 
  ggplot() +
    aes(x = nextdate_37_kept, 
        y = value) +
    geom_boxplot() +
    facet_wrap(~var, scales = "free_y") +
    
    labs(title="Next visit date kept?",
         subtitle="by price or weather parameter")

Code
# pvalues
pop_params_pvalues<-pop_params_dropouts_37days_prep %>% 
    group_by(var) %>%
    reframe(t_test = list(t.test(value ~ nextdate_37_kept, data = cur_data()))) %>%
    mutate(tidy_result = map(t_test, broom::tidy)) %>%
    unnest(tidy_result)

# effect sizes
pop_params_effect_sizes <- pop_params_dropouts_37days_prep %>%
  group_by(var) %>%
  reframe(cohens_d = cohens_d(value ~ nextdate_37_kept, data = cur_data())) %>% 
  unnest("cohens_d")

pop_params_pvalues %>% select(var, estimate, p.value)
# A tibble: 15 × 3
   var                                        estimate  p.value
   <chr>                                         <dbl>    <dbl>
 1 cli_max_heat_index_c_pre10day_mean      -0.102      7.57e- 6
 2 cli_max_temp_c_pre10day_mean            -0.422      2.80e-61
 3 cli_rain_hours_pre10day_mean             0.693      3.85e-77
 4 cli_rain_mm_pre10day_mean                0.635      4.67e-72
 5 mkt_diesel_pre10day_mean                25.6        1.28e-31
 6 mkt_gasoline_pre10day_mean              -1.18       4.26e- 1
 7 mkt_maize_grain_white_pre10day_mean     11.8        2.09e-27
 8 mkt_maize_grain_yellow_pre10day_mean    13.2        9.55e-32
 9 mkt_millet_pearl_pre10day_mean          16.2        5.99e-44
10 mkt_rice_5_percent_broken_pre10day_mean 32.5        7.39e-35
11 mkt_rice_milled_pre10day_mean           30.8        1.17e-48
12 mkt_sorghum_brown_pre10day_mean         18.1        5.87e-39
13 mkt_sorghum_white_pre10day_mean         19.0        3.87e-50
14 mkt_usd_exchange_rate_pre10day_mean     -0.00000936 3.40e- 2
15 mkt_yams_pre10day_mean                   9.80       3.21e-17
Code
pop_params_effect_sizes
# A tibble: 15 × 5
   var                                     Cohens_d    CI  CI_low  CI_high
   <chr>                                      <dbl> <dbl>   <dbl>    <dbl>
 1 cli_max_heat_index_c_pre10day_mean      -0.0353   0.95 -0.0510 -0.0195 
 2 cli_max_temp_c_pre10day_mean            -0.131    0.95 -0.147  -0.115  
 3 cli_rain_hours_pre10day_mean             0.153    0.95  0.137   0.168  
 4 cli_rain_mm_pre10day_mean                0.148    0.95  0.133   0.164  
 5 mkt_diesel_pre10day_mean                 0.0908   0.95  0.0750  0.107  
 6 mkt_gasoline_pre10day_mean              -0.00636  0.95 -0.0221  0.00939
 7 mkt_maize_grain_white_pre10day_mean      0.0893   0.95  0.0735  0.105  
 8 mkt_maize_grain_yellow_pre10day_mean     0.0961   0.95  0.0804  0.112  
 9 mkt_millet_pearl_pre10day_mean           0.115    0.95  0.0995  0.131  
10 mkt_rice_5_percent_broken_pre10day_mean  0.102    0.95  0.0858  0.117  
11 mkt_rice_milled_pre10day_mean            0.120    0.95  0.104   0.136  
12 mkt_sorghum_brown_pre10day_mean          0.109    0.95  0.0934  0.125  
13 mkt_sorghum_white_pre10day_mean          0.124    0.95  0.109   0.140  
14 mkt_usd_exchange_rate_pre10day_mean     -0.0170   0.95 -0.0328 -0.00128
15 mkt_yams_pre10day_mean                   0.0673   0.95  0.0515  0.0830 
Code
# use these for the model
pop_params_top<-pop_params_effect_sizes %>% filter(abs(Cohens_d)>0.11) %>% pluck("var")
print(paste0("Consider these vars for model: ", str_c(pop_params_top, collapse =', ')))
[1] "Consider these vars for model: cli_max_temp_c_pre10day_mean, cli_rain_hours_pre10day_mean, cli_rain_mm_pre10day_mean, mkt_millet_pearl_pre10day_mean, mkt_rice_milled_pre10day_mean, mkt_sorghum_white_pre10day_mean"
Code
readr::write_lines(pop_params_top, "cli_mkt_params_for_model.txt")

A t test is performed across all these pop level parameters, to test if between the means of the parameters differ where the next visit is a success or failure (success defined as Next Visit =< 37 days from current visit).

All price and weather parameters have a small pvalue, except the price of gasoline.

However, the effect sizes (cohens d) are small for all parameters. Max temp, rain total, rain hours, millet, rice, and sorghum price, price have the highest effect sizes (>0.11). Max temp has an inverse association than expected (more attendance on comparatively hot days).

For greater effect size, may need to respecify. For example, instead of using 10 day averages, use precise weather variables on the date 28 days after latest appointment, with an outcome of visiting on that exact day.

Here is an example: for a given 10 day window, is the average price of milled rice at state level related to the rate of patients returning for subsequent visits within that 10 day window?

Code
tots_timely_rates_prep_detailed<-tots_timely_qual_cli_mkt %>% 
   filter(censor==0) %>% 
   filter(!is.na(sex)) %>% 
   filter(visits_since_dropout<2) 
#dropout occurs only once.... and if censored, not included in denom


find_dropout_rates_detailed <-function(data, ...){
  data %>% 
  mutate(nextdate_37_kept=factor(nextdate_37_kept, levels=c(0,1))) %>% 
  group_by(nextdate_37, ...) %>% 
  count(nextdate_37_kept) %>% 
  mutate(rate=n/sum(n)) %>% 
  ungroup() %>% 
  filter(nextdate_37_kept==1) 
}

tots_timely_rates_prep_detailed %>% 
  find_dropout_rates_detailed(State, mkt_rice_milled_pre10day_mean) %>% 
  filter(n>9) %>% 
  ggplot(aes(x=mkt_rice_milled_pre10day_mean, y=rate)) + 
  geom_jitter( alpha=0.5)+
  geom_smooth() +
  expand_limits(y = c(0, 1)) +
  labs(title="State Level 37-day Timely Return Rates, By market prices for milled rice",
       subtitle="excluding date ranges where the State expected less than 10 patients")

Review and Save data

In a final review, check distributions of all the vars.

Code
dfs<-tots_timely_qual_cli_mkt %>% 
  summarytools::dfSummary(plain.ascii = FALSE,stile="grid",
                          max.string.width=15)

dfs$Variable <- gsub("_", " ", dfs$Variable)

print(dfs, method="render", varnumbers   = FALSE, 
                valid.col    = FALSE,max.tbl.width = 40,
      caption="Removing _underscores_ from varnames for space")

Data Frame Summary

tots_timely_qual_cli_mkt

Dimensions: 153413 x 107
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
reg id [numeric]
Mean (sd) : 16102.7 (9361.3)
min ≤ med ≤ max:
0 ≤ 16067 ≤ 32332
IQR (CV) : 16258 (0.6)
29667 distinct values 0 (0.0%)
reg visit id [character]
1. 10209_1
2. 10224_3
3. 10528_4
4. 10916_4
5. 10976_5
6. 11008_6
7. 11129_19
8. 11245_10
9. 11257_5
10. 11371_24
[ 153227 others ]
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
2 ( 0.0% )
153393 ( 100.0% )
0 (0.0%)
registration date [Date]
min : 2021-01-01
med : 2022-05-10
max : 2024-07-01
range : 3y 6m 0d
1246 distinct values 0 (0.0%)
event date [Date]
min : 2021-01-01
med : 2023-01-23
max : 2024-07-01
range : 3y 6m 0d
1278 distinct values 0 (0.0%)
appt tally [numeric]
Mean (sd) : 6 (5.5)
min ≤ med ≤ max:
1 ≤ 4 ≤ 45
IQR (CV) : 6 (0.9)
45 distinct values 0 (0.0%)
sex [character]
1. Female
2. Male
117013 ( 76.3% )
36357 ( 23.7% )
43 (0.0%)
State [character]
1. Kano
2. Ogun
97605 ( 63.6% )
55808 ( 36.4% )
0 (0.0%)
total patient visits [integer]
Mean (sd) : 11 (7.9)
min ≤ med ≤ max:
1 ≤ 9 ≤ 45
IQR (CV) : 11 (0.7)
42 distinct values 0 (0.0%)
ncd patient status [character]
1. Active record
2. Died
3. Inactive record
4. Transferred
5. Unresponsive
151920 ( 99.0% )
452 ( 0.3% )
424 ( 0.3% )
548 ( 0.4% )
69 ( 0.0% )
0 (0.0%)
prev event date [Date]
min : 2021-01-01
med : 2022-12-22
max : 2024-06-24
range : 3y 5m 23d
1264 distinct values 29690 (19.4%)
lead event date [Date]
min : 2021-01-04
med : 2023-03-09
max : 2024-07-01
range : 3y 5m 27d
1259 distinct values 29543 (19.3%)
nextdate 37 [Date]
min : 2021-02-07
med : 2023-03-01
max : 2024-08-07
range : 3y 6m 0d
1278 distinct values 0 (0.0%)
nextdate 90 [Date]
min : 2021-04-01
med : 2023-04-23
max : 2024-09-29
range : 3y 5m 28d
1278 distinct values 0 (0.0%)
nextdate 37 kept [numeric]
Min : 0
Mean : 0.4
Max : 1
0 : 85159 ( 55.5% )
1 : 68254 ( 44.5% )
0 (0.0%)
nextdate 90 kept [numeric]
Min : 0
Mean : 0.7
Max : 1
0 : 50260 ( 32.8% )
1 : 103153 ( 67.2% )
0 (0.0%)
stored by [character]
1. Jaen_PHC
2. Gwagwarwa_PHC
3. Hotoro_NorthPHC
4. Kabuga_PHC
5. Okeilewo_FHC
6. Dawakindakata_P
7. Gano_PHC
8. Dala_PHC
9. Sheka_PHC
10. akumbabarns
[ 97 others ]
7268 ( 6.9% )
7157 ( 6.8% )
6519 ( 6.2% )
4464 ( 4.2% )
4031 ( 3.8% )
3825 ( 3.6% )
3721 ( 3.5% )
3651 ( 3.4% )
3176 ( 3.0% )
2498 ( 2.4% )
59616 ( 56.3% )
47487 (31.0%)
created by [character]
1. Akumba, Barnaba
2. DAKATA, NASSARA
3. Recorder, GWAGW
4. Recorder, JAEN
5. Recorder, HOTOR
6. Okeilewo, FHC (
7. Recorder, KABUG
8. Recorder, GANO
9. Recorder, SHEKA
10. Recorder, DALA
[ 96 others ]
15970 ( 10.4% )
8352 ( 5.4% )
8036 ( 5.2% )
7569 ( 4.9% )
6092 ( 4.0% )
5142 ( 3.4% )
4556 ( 3.0% )
3477 ( 2.3% )
3082 ( 2.0% )
2926 ( 1.9% )
88211 ( 57.5% )
0 (0.0%)
last updated by [character]
1. Recorder, JAEN
2. Recorder, HOTOR
3. Recorder, GWAGW
4. DAKATA, NASSARA
5. Recorder, KABUG
6. Okeilewo, FHC (
7. Recorder, GANO
8. Recorder, DALA
9. Recorder, SHEKA
10. Olose, PHC (Olo
[ 97 others ]
9574 ( 6.2% )
9085 ( 5.9% )
9063 ( 5.9% )
8352 ( 5.4% )
5544 ( 3.6% )
5496 ( 3.6% )
4377 ( 2.9% )
4030 ( 2.6% )
3429 ( 2.2% )
3209 ( 2.1% )
91254 ( 59.5% )
0 (0.0%)
last updated on [Date]
min : 2021-01-02
med : 2023-06-10
max : 2024-07-21
range : 3y 6m 19d
1265 distinct values 0 (0.0%)
scheduled date [Date]
min : 1974-06-12
med : 2023-03-31
max : 2201-08-29
range : 227y 2m 17d
1331 distinct values 0 (0.0%)
incident date [Date]
min : 2003-12-08
med : 2022-06-14
max : 2024-07-17
range : 20y 7m 9d
1233 distinct values 0 (0.0%)
organisation unit name [character]
1. kn Ja'En Primar
2. kn Hotoro Arewa
3. kn Gwagwarwa Pr
4. kn Dawakin Daka
5. kn Kabuga Prima
6. og Oke Ilewo Fa
7. kn Gano Primary
8. kn Dala Primary
9. kn Shekar Barde
10. og Olose Primar
[ 94 others ]
9816 ( 6.4% )
9691 ( 6.3% )
9293 ( 6.1% )
8352 ( 5.4% )
5736 ( 3.7% )
5519 ( 3.6% )
4539 ( 3.0% )
4193 ( 2.7% )
3748 ( 2.4% )
3271 ( 2.1% )
89255 ( 58.2% )
0 (0.0%)
Nat [character] 1. fg Nigeria 
·
153413 ( 100.0% )
0 (0.0%)
LGA [character]
1. Nasarawa
2. Gwale
3. Dala
4. Abeokuta South
5. Kano Municipal
6. Kumbotso
7. Ifo
8. Abeokuta North
9. Dawakin Kudu
10. Tsanyawa
[ 32 others ]
27813 ( 18.1% )
15552 ( 10.1% )
9870 ( 6.4% )
8028 ( 5.2% )
5888 ( 3.8% )
5852 ( 3.8% )
5312 ( 3.5% )
4978 ( 3.2% )
4927 ( 3.2% )
4809 ( 3.1% )
60384 ( 39.4% )
0 (0.0%)
Ward [character]
1.  
·
kn Dorayi Ward
2.  
·
kn Hotoro Nort
3.  
·
kn Gawuna Ward
4.  
·
kn Dakata Ward
5.  
·
kn Kabuga Ward
6.  
·
og Oke-Ilewo W
7.  
·
kn Gano Ward 
·
8.  
·
kn Kabuwaya Wa
9.  
·
kn Dan Maliki
10. Ota Local Gover
[ 90 others ]
9816 ( 6.4% )
9691 ( 6.3% )
9293 ( 6.1% )
8352 ( 5.4% )
5736 ( 3.7% )
5519 ( 3.6% )
4539 ( 3.0% )
4193 ( 2.7% )
3749 ( 2.4% )
3576 ( 2.3% )
88949 ( 58.0% )
0 (0.0%)
HC [character]
1.  
·
kn Ja'En Prima
2.  
·
kn Hotoro Arew
3.  
·
kn Gwagwarwa P
4.  
·
kn Dawakin Dak
5.  
·
kn Kabuga Prim
6.  
·
og Oke Ilewo F
7.  
·
kn Gano Primar
8.  
·
kn Dala Primar
9.  
·
kn Shekar Bard
10.  
·
og Olose Prima
[ 95 others ]
9816 ( 6.4% )
9691 ( 6.3% )
9293 ( 6.1% )
8352 ( 5.4% )
5736 ( 3.7% )
5519 ( 3.6% )
4539 ( 3.0% )
4193 ( 2.7% )
3748 ( 2.4% )
3271 ( 2.1% )
89255 ( 58.2% )
0 (0.0%)
FACILITY [character]
1.  
·
og Agbara Prim
2.  
·
og Ajuwon Prim
3.  
·
og Alagbon Hea
4.  
·
og Igbogila Pr
5.  
·
og Keesi Prima
6.  
·
og Mobalufon M
7.  
·
og Oke-Oyinbo
8.  
·
og Otta Primar
9.  
·
og Owode Healt
10.  
·
og Sango Prima
213 ( 2.5% )
398 ( 4.6% )
651 ( 7.6% )
1019 ( 11.8% )
1209 ( 14.1% )
657 ( 7.6% )
898 ( 10.4% )
1812 ( 21.1% )
193 ( 2.2% )
1551 ( 18.0% )
144812 (94.4%)
program status [character]
1. ACTIVE
2. CANCELLED
3. COMPLETED
152913 ( 99.7% )
450 ( 0.3% )
50 ( 0.0% )
0 (0.0%)
event status [character]
1. ACTIVE
2. COMPLETED
3659 ( 2.4% )
149754 ( 97.6% )
0 (0.0%)
organisation unit [character]
1. ehJsDey0nF2
2. DxfVRX9j8np
3. xUORgwU22YU
4. jpPaQTbjIlU
5. aoOPscxu7Tc
6. S84LuOwGZ6R
7. bYamdI7FHE5
8. JJL8Ukh1ZlZ
9. uRzXjHHigm9
10. PKf7VHpxmBG
[ 95 others ]
9816 ( 6.4% )
9691 ( 6.3% )
9293 ( 6.1% )
8352 ( 5.4% )
5736 ( 3.7% )
5519 ( 3.6% )
4539 ( 3.0% )
4193 ( 2.7% )
3748 ( 2.4% )
3271 ( 2.1% )
89255 ( 58.2% )
0 (0.0%)
height cm [numeric]
Mean (sd) : 86.4 (79.3)
min ≤ med ≤ max:
0.4 ≤ 148 ≤ 758
IQR (CV) : 158.4 (0.9)
156 distinct values 146189 (95.3%)
med losartan [character]
1. Losartan 100mg
2. Losartan 50mg
8819 ( 15.9% )
46480 ( 84.1% )
98114 (64.0%)
hypertension treated in the past [character]
1. First time
2. Was treated in
127501 ( 83.7% )
24776 ( 16.3% )
1136 (0.7%)
bp diastole mmhg [numeric]
Mean (sd) : 84 (12.9)
min ≤ med ≤ max:
1 ≤ 82 ≤ 200
IQR (CV) : 14 (0.2)
181 distinct values 1066 (0.7%)
med hydrochlorothiazide [character]
1. Hydrochlorathia
2. Hydrochlorathia
156 ( 11.5% )
1201 ( 88.5% )
152056 (99.1%)
other hypertension medication [character]
1. N
2. none
3. Lisinopril
4. nifedipine Vaso
5. no
6. non
7. elanapril
8. other drug
9. nil
10. vasoprin
[ 111 others ]
2144 ( 81.8% )
177 ( 6.8% )
47 ( 1.8% )
18 ( 0.7% )
17 ( 0.6% )
16 ( 0.6% )
12 ( 0.5% )
11 ( 0.4% )
10 ( 0.4% )
8 ( 0.3% )
160 ( 6.1% )
150793 (98.3%)
weight kg [numeric]
Mean (sd) : 73.6 (71.8)
min ≤ med ≤ max:
28 ≤ 70 ≤ 6298
IQR (CV) : 21 (1)
116 distinct values 145524 (94.9%)
patient followup status [character]
1. Called
2. Overdue Pending
3. Upcoming visit
766 ( 0.5% )
113357 ( 79.7% )
28175 ( 19.8% )
11115 (7.2%)
bmi [numeric]
Mean (sd) : 137463.3 (281054.3)
min ≤ med ≤ max:
0 ≤ 38.4 ≤ 4296296
IQR (CV) : 262195.2 (2)
1478 distinct values 146184 (95.3%)
bmi measurement [numeric] 1 distinct value
1 : 8591 ( 100.0% )
144822 (94.4%)
days of medication [character]
1. 30 days
2. 60 days
3. 90 days
25742 ( 97.0% )
659 ( 2.5% )
139 ( 0.5% )
126873 (82.7%)
consent to send sms reminders [numeric] 1 distinct value
1 : 25896 ( 100.0% )
127517 (83.1%)
does patient have diabetes [character]
1. No
2. Yes
139830 ( 97.8% )
3078 ( 2.2% )
10505 (6.8%)
blood sugar measurement [numeric] 1 distinct value
1 : 501 ( 100.0% )
152912 (99.7%)
relationship to patient [character]
1. Daughter
2. Father
3. Friend
4. Husband
5. Mother
6. Neighbor
7. Other
8. Son
9. Wife
21134 ( 17.2% )
5239 ( 4.3% )
1136 ( 0.9% )
27941 ( 22.8% )
642 ( 0.5% )
2547 ( 2.1% )
10205 ( 8.3% )
44818 ( 36.6% )
8902 ( 7.3% )
30849 (20.1%)
bp systole mmhg [numeric]
Mean (sd) : 141.2 (19.7)
min ≤ med ≤ max:
1 ≤ 140 ≤ 300
IQR (CV) : 23 (0.1)
224 distinct values 530 (0.3%)
other diabetes medication [logical]
All NA's
153413 (100.0%)
newage [character]
1. 50
2. 60
3. 70
4. 40
5. 55
6. 45
7. 65
8. 80
9. 52
10. 35
[ 86 others ]
12712 ( 8.3% )
11770 ( 7.7% )
8481 ( 5.5% )
7095 ( 4.6% )
6477 ( 4.2% )
6370 ( 4.2% )
5382 ( 3.5% )
3593 ( 2.3% )
3453 ( 2.3% )
3342 ( 2.2% )
84738 ( 55.2% )
0 (0.0%)
med amlodipine [character]
1. Amlodipine 10mg
2. Amlodipine 5mg
3. Other drug
13092 ( 8.8% )
136353 ( 91.2% )
15 ( 0.0% )
3953 (2.6%)
med telmisartan [character]
1. Telmisartan(40m
2. Telmisartan(80m
310 ( 72.9% )
115 ( 27.1% )
152988 (99.7%)
does patient have hypertension [character] 1. Yes
153413 ( 100.0% )
0 (0.0%)
consent to record data [numeric] 1 distinct value
1 : 153413 ( 100.0% )
0 (0.0%)
has supporter [numeric]
Min : 0
Mean : 0.8
Max : 1
0 : 29413 ( 19.2% )
1 : 124000 ( 80.8% )
0 (0.0%)
has phone [numeric]
Min : 0
Mean : 0.8
Max : 1
0 : 28289 ( 18.4% )
1 : 125124 ( 81.6% )
0 (0.0%)
age [integer]
Mean (sd) : 54 (14.1)
min ≤ med ≤ max:
18 ≤ 53 ≤ 124
IQR (CV) : 18 (0.3)
96 distinct values 0 (0.0%)
days since entry [numeric]
Mean (sd) : 245.1 (248.8)
min ≤ med ≤ max:
0 ≤ 170 ≤ 1272
IQR (CV) : 353 (1)
1255 distinct values 0 (0.0%)
exp year [numeric]
Mean (sd) : 2022.8 (0.9)
min ≤ med ≤ max:
2021 ≤ 2023 ≤ 2024
IQR (CV) : 1 (0)
2021 : 12579 ( 8.2% )
2022 : 43654 ( 28.5% )
2023 : 61089 ( 39.8% )
2024 : 36091 ( 23.5% )
0 (0.0%)
exp month [numeric]
Mean (sd) : 6.6 (3.3)
min ≤ med ≤ max:
1 ≤ 7 ≤ 12
IQR (CV) : 5 (0.5)
12 distinct values 0 (0.0%)
exp year month [Date]
min : 2021-04-01
med : 2023-04-01
max : 2024-09-01
range : 3y 5m 0d
42 distinct values 0 (0.0%)
days since last [integer]
Mean (sd) : 66.9 (87.4)
min ≤ med ≤ max:
7 ≤ 35 ≤ 1209
IQR (CV) : 36 (1.3)
892 distinct values 29690 (19.4%)
nextdate 90 dropout [numeric]
Min : 0
Mean : 0.5
Max : 1
0 : 76188 ( 49.7% )
1 : 77225 ( 50.3% )
0 (0.0%)
visits since dropout [numeric]
Mean (sd) : 1.8 (3.2)
min ≤ med ≤ max:
0 ≤ 1 ≤ 35
IQR (CV) : 2 (1.8)
36 distinct values 0 (0.0%)
censor [numeric]
Min : 0
Mean : 0.1
Max : 1
0 : 141106 ( 92.0% )
1 : 12307 ( 8.0% )
0 (0.0%)
p step1 need [numeric]
Min : 0
Mean : 0.1
Max : 1
0 : 140951 ( 91.9% )
1 : 12462 ( 8.1% )
0 (0.0%)
p step2 need [numeric]
Min : 0
Mean : 0.2
Max : 1
0 : 127335 ( 83.0% )
1 : 26078 ( 17.0% )
0 (0.0%)
p step3 need [numeric]
Min : 0
Mean : 0.1
Max : 1
0 : 139241 ( 90.8% )
1 : 14172 ( 9.2% )
0 (0.0%)
p step4 need [numeric]
Min : 0
Mean : 0
Max : 1
0 : 146882 ( 95.7% )
1 : 6531 ( 4.3% )
0 (0.0%)
p step1 met [numeric]
Min : 0
Mean : 0.1
Max : 1
0 : 141580 ( 92.3% )
1 : 11833 ( 7.7% )
0 (0.0%)
p step2 met [numeric]
Min : 0
Mean : 0.1
Max : 1
0 : 137348 ( 89.5% )
1 : 16065 ( 10.5% )
0 (0.0%)
p step3 met [numeric]
Min : 0
Mean : 0
Max : 1
0 : 151287 ( 98.6% )
1 : 2126 ( 1.4% )
0 (0.0%)
p step4 met [numeric]
Min : 0
Mean : 0
Max : 1
0 : 153395 ( 100.0% )
1 : 18 ( 0.0% )
0 (0.0%)
sbp baseline [numeric]
Mean (sd) : 157.6 (17.2)
min ≤ med ≤ max:
14 ≤ 158 ≤ 300
IQR (CV) : 23 (0.1)
165 distinct values 514 (0.3%)
dbp baseline [numeric]
Mean (sd) : 92.4 (13.3)
min ≤ med ≤ max:
1 ≤ 91 ≤ 200
IQR (CV) : 15 (0.1)
153 distinct values 720 (0.5%)
sbp change from baseline [numeric]
Mean (sd) : -16.4 (20.6)
min ≤ med ≤ max:
-195 ≤ -14 ≤ 178
IQR (CV) : 30 (-1.3)
302 distinct values 917 (0.6%)
dbp change from baseline [numeric]
Mean (sd) : -8.4 (14.1)
min ≤ med ≤ max:
-143 ≤ -6 ≤ 130
IQR (CV) : 17 (-1.7)
241 distinct values 1592 (1.0%)
lga 37date [character]
1. Nasarawa_2024-0
2. Nasarawa_2023-0
3. Nasarawa_2023-0
4. Nasarawa_2023-0
5. Nasarawa_2022-1
6. Nasarawa_2024-0
7. Nasarawa_2023-0
8. Nasarawa_2024-0
9. Nasarawa_2023-0
10. Nasarawa_2022-0
[ 27800 others ]
140 ( 0.1% )
129 ( 0.1% )
96 ( 0.1% )
93 ( 0.1% )
83 ( 0.1% )
83 ( 0.1% )
82 ( 0.1% )
80 ( 0.1% )
79 ( 0.1% )
77 ( 0.1% )
152471 ( 99.4% )
0 (0.0%)
cli max temp c [numeric]
Mean (sd) : 30.4 (3.4)
min ≤ med ≤ max:
18.5 ≤ 29.6 ≤ 41.6
IQR (CV) : 4.6 (0.1)
217 distinct values 4659 (3.0%)
cli max heat index c [numeric]
Mean (sd) : 33.7 (3.3)
min ≤ med ≤ max:
19 ≤ 33.7 ≤ 44
IQR (CV) : 4.6 (0.1)
232 distinct values 4659 (3.0%)
cli rain mm [numeric]
Mean (sd) : 4.1 (7.4)
min ≤ med ≤ max:
0 ≤ 0.6 ≤ 167.6
IQR (CV) : 5.4 (1.8)
523 distinct values 4659 (3.0%)
cli rain hours [numeric]
Mean (sd) : 4.9 (6)
min ≤ med ≤ max:
0 ≤ 2 ≤ 24
IQR (CV) : 8 (1.2)
25 distinct values 4659 (3.0%)
cli max temp c pre10day mean [numeric]
Mean (sd) : 30.4 (3.2)
min ≤ med ≤ max:
21 ≤ 29.6 ≤ 40
IQR (CV) : 4.4 (0.1)
1582 distinct values 4659 (3.0%)
cli max heat index c pre10day mean [numeric]
Mean (sd) : 33.7 (3)
min ≤ med ≤ max:
22.5 ≤ 33.7 ≤ 42.4
IQR (CV) : 3.8 (0.1)
1687 distinct values 4659 (3.0%)
cli rain mm pre10day mean [numeric]
Mean (sd) : 4.1 (4.4)
min ≤ med ≤ max:
0 ≤ 2.9 ≤ 42.9
IQR (CV) : 6.3 (1.1)
2607 distinct values 4659 (3.0%)
cli rain hours pre10day mean [numeric]
Mean (sd) : 4.8 (4.6)
min ≤ med ≤ max:
0 ≤ 3.7 ≤ 20.6
IQR (CV) : 7 (1)
202 distinct values 4659 (3.0%)
state 37date [character]
1. Kano_2023-06-21
2. Kano_2023-07-06
3. Kano_2023-06-28
4. Kano_2023-01-18
5. Kano_2022-07-27
6. Kano_2023-01-11
7. Kano_2023-02-03
8. Kano_2023-08-09
9. Kano_2023-07-26
10. Kano_2022-10-19
[ 2533 others ]
326 ( 0.2% )
321 ( 0.2% )
303 ( 0.2% )
289 ( 0.2% )
253 ( 0.2% )
236 ( 0.2% )
236 ( 0.2% )
236 ( 0.2% )
234 ( 0.2% )
231 ( 0.2% )
150748 ( 98.3% )
0 (0.0%)
mkt diesel [numeric]
Mean (sd) : 752.7 (287.3)
min ≤ med ≤ max:
188 ≤ 744 ≤ 1470
IQR (CV) : 218 (0.4)
169 distinct values 4659 (3.0%)
mkt gasoline [numeric]
Mean (sd) : 369.5 (219.1)
min ≤ med ≤ max:
161 ≤ 274 ≤ 1070
IQR (CV) : 405.2 (0.6)
123 distinct values 4659 (3.0%)
mkt maize grain white [numeric]
Mean (sd) : 346.2 (167.4)
min ≤ med ≤ max:
151.5 ≤ 282 ≤ 814
IQR (CV) : 194.4 (0.5)
175 distinct values 4659 (3.0%)
mkt maize grain yellow [numeric]
Mean (sd) : 371.3 (169.7)
min ≤ med ≤ max:
159.2 ≤ 322.3 ≤ 851.4
IQR (CV) : 211.3 (0.5)
189 distinct values 4659 (3.0%)
mkt millet pearl [numeric]
Mean (sd) : 370 (167.6)
min ≤ med ≤ max:
160.3 ≤ 316.1 ≤ 847.9
IQR (CV) : 146.8 (0.5)
210 distinct values 4659 (3.0%)
mkt rice 5 percent broken [numeric]
Mean (sd) : 895.6 (395.7)
min ≤ med ≤ max:
475.8 ≤ 724.1 ≤ 2160.5
IQR (CV) : 466.1 (0.4)
201 distinct values 4659 (3.0%)
mkt rice milled [numeric]
Mean (sd) : 719.5 (320)
min ≤ med ≤ max:
278 ≤ 607.1 ≤ 1641.3
IQR (CV) : 398.9 (0.4)
179 distinct values 4659 (3.0%)
mkt sorghum brown [numeric]
Mean (sd) : 378.2 (193.1)
min ≤ med ≤ max:
155.9 ≤ 319.6 ≤ 991.5
IQR (CV) : 223.4 (0.5)
183 distinct values 4659 (3.0%)
mkt sorghum white [numeric]
Mean (sd) : 365.7 (178.4)
min ≤ med ≤ max:
154.7 ≤ 332.4 ≤ 852.3
IQR (CV) : 191 (0.5)
159 distinct values 4659 (3.0%)
mkt yams [numeric]
Mean (sd) : 338 (184.8)
min ≤ med ≤ max:
136.9 ≤ 267.8 ≤ 836.7
IQR (CV) : 204.9 (0.5)
234 distinct values 4659 (3.0%)
mkt usd exchange rate [numeric]
Mean (sd) : 0 (0)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0
IQR (CV) : 0 (0.3)
36 distinct values 4659 (3.0%)
mkt diesel pre10day mean [numeric]
Mean (sd) : 749.2 (287)
min ≤ med ≤ max:
188 ≤ 744 ≤ 1458
IQR (CV) : 210 (0.4)
1254 distinct values 4659 (3.0%)
mkt gasoline pre10day mean [numeric]
Mean (sd) : 366.2 (216.7)
min ≤ med ≤ max:
160.8 ≤ 274 ≤ 1070
IQR (CV) : 400.8 (0.6)
929 distinct values 4659 (3.0%)
mkt maize grain white pre10day mean [numeric]
Mean (sd) : 344.3 (165.2)
min ≤ med ≤ max:
153.8 ≤ 281.5 ≤ 814
IQR (CV) : 189.2 (0.5)
1534 distinct values 4659 (3.0%)
mkt maize grain yellow pre10day mean [numeric]
Mean (sd) : 369.4 (168)
min ≤ med ≤ max:
162.1 ≤ 322.3 ≤ 851.4
IQR (CV) : 208.2 (0.5)
1672 distinct values 4659 (3.0%)
mkt millet pearl pre10day mean [numeric]
Mean (sd) : 368.5 (166.4)
min ≤ med ≤ max:
161.1 ≤ 316.1 ≤ 846.8
IQR (CV) : 150 (0.5)
1766 distinct values 4659 (3.0%)
mkt rice 5 percent broken pre10day mean [numeric]
Mean (sd) : 891.6 (392.4)
min ≤ med ≤ max:
479 ≤ 727.5 ≤ 2157.1
IQR (CV) : 468.8 (0.4)
1713 distinct values 4659 (3.0%)
mkt rice milled pre10day mean [numeric]
Mean (sd) : 714.5 (313.8)
min ≤ med ≤ max:
281.6 ≤ 610.5 ≤ 1636.4
IQR (CV) : 408.9 (0.4)
1578 distinct values 4659 (3.0%)
mkt sorghum brown pre10day mean [numeric]
Mean (sd) : 376.2 (191.5)
min ≤ med ≤ max:
159.1 ≤ 319.6 ≤ 989
IQR (CV) : 221.4 (0.5)
1599 distinct values 4659 (3.0%)
mkt sorghum white pre10day mean [numeric]
Mean (sd) : 363.9 (177.2)
min ≤ med ≤ max:
154.7 ≤ 332 ≤ 852.3
IQR (CV) : 193.6 (0.5)
1411 distinct values 4659 (3.0%)
mkt yams pre10day mean [numeric]
Mean (sd) : 335.4 (182)
min ≤ med ≤ max:
139.1 ≤ 269.2 ≤ 836.7
IQR (CV) : 211.6 (0.5)
1921 distinct values 4659 (3.0%)
mkt usd exchange rate pre10day mean [numeric]
Mean (sd) : 0 (0)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0
IQR (CV) : 0 (0.3)
334 distinct values 4659 (3.0%)

Generated by summarytools 1.0.1 (R version 4.3.3)
2024-08-27

Code
# dhis_subset %>% skimr::skim()

# dhis_clean %>% 
#   select(newage, reg_id) %>% 
#   distinct() %>% 
#   mutate(newage=as.numeric(newage)) %>% 
#   skimr::skim()
Code
readr::write_rds(tots_timely_qual_cli_mkt, 
                 here::here("data","tots_timely_qual_cli_mkt.rds"))

sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=en_US.UTF-8  LC_CTYPE=en_US.UTF-8    LC_MONETARY=en_US.UTF-8
[4] LC_NUMERIC=C            LC_TIME=en_US.UTF-8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.1         DT_0.33            RColorBrewer_1.1-3 effectsize_0.8.7  
 [5] kableExtra_1.4.0   summarytools_1.0.1 patchwork_1.2.0    skimr_2.1.5       
 [9] slider_0.3.1       table1_1.4.3       lubridate_1.9.3    forcats_1.0.0     
[13] stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5       
[17] tidyr_1.3.1        tibble_3.2.1       ggplot2_3.5.1      tidyverse_2.0.0   

loaded via a namespace (and not attached):
 [1] tcltk_4.3.3       sandwich_3.1-0    rlang_1.1.4       magrittr_2.0.3   
 [5] multcomp_1.4-25   snakecase_0.11.1  matrixStats_1.2.0 compiler_4.3.3   
 [9] mgcv_1.9-1        systemfonts_1.1.0 vctrs_0.6.5       reshape2_1.4.4   
[13] pkgconfig_2.0.3   crayon_1.5.2      fastmap_1.2.0     backports_1.4.1  
[17] magick_2.8.3      labeling_0.4.3    pander_0.6.5      utf8_1.2.4       
[21] rmarkdown_2.26    tzdb_0.4.0        ragg_1.3.0        bit_4.0.5        
[25] xfun_0.43         cachem_1.1.0      jsonlite_1.8.8    highr_0.10       
[29] pryr_0.1.6        broom_1.0.5       parallel_4.3.3    R6_2.5.1         
[33] bslib_0.7.0       stringi_1.8.4     jquerylib_0.1.4   estimability_1.5 
[37] Rcpp_1.0.12       knitr_1.46        warp_0.2.1        zoo_1.8-12       
[41] base64enc_0.1-3   parameters_0.21.6 Matrix_1.6-5      splines_4.3.3    
[45] timechange_0.3.0  tidyselect_1.2.1  rstudioapi_0.16.0 yaml_2.3.8       
[49] codetools_0.2-19  lattice_0.22-6    plyr_1.8.9        withr_3.0.0      
[53] bayestestR_0.13.2 coda_0.19-4.1     evaluate_0.23     survival_3.5-8   
[57] xml2_1.3.6        pillar_1.9.0      checkmate_2.3.1   insight_0.19.10  
[61] generics_0.1.3    vroom_1.6.5       rprojroot_2.0.4   hms_1.1.3        
[65] munsell_0.5.1     scales_1.3.0      xtable_1.8-4      glue_1.7.0       
[69] janitor_2.2.0     emmeans_1.10.1    tools_4.3.3       mvtnorm_1.2-4    
[73] rapportools_1.1   grid_4.3.3        crosstalk_1.2.1   datawizard_0.10.0
[77] colorspace_2.1-0  nlme_3.1-164      repr_1.1.7        Formula_1.2-5    
[81] cli_3.6.2         textshaping_0.3.7 fansi_1.0.6       viridisLite_0.4.2
[85] svglite_2.1.3     gtable_0.3.5      sass_0.4.9        digest_0.6.35    
[89] TH.data_1.1-2     htmlwidgets_1.6.4 farver_2.1.2      htmltools_0.5.8.1
[93] lifecycle_1.0.4   bit64_4.0.5       MASS_7.3-60.0.1