Msc Thesis Pt1: Describing NHCI Dropouts

Working document

Author

Affiliation

Brian ODonnell

Charité - Universitätsmedizin Berlin

This document shows work for describing NHCI patient population by attendance status.

Part two in a separate document, will work with the resulting dataset to predict patient dropouts using the tidymodels workflow

For a full description of the final dataset, see section Section 7

Goals Summary

Import of hypertension data from DHIS2 Line List
Cleaning and Data quality review
Table1 for never returners (dropout after baseline)
Descriptive statistics and explanation
Add in Market Prices and Weather data as plausible predictors of dropout, with ttests

Code

 #| label: load-packages
 #| include: false
 #| echo: false

library(tidyverse)
library(table1)
library(slider)
library(skimr)
library(patchwork)
library(summarytools)
library(kableExtra)
library(effectsize)
library(RColorBrewer)
library(DT)
library(here)

palette <- brewer.pal(n = 8, name = "Dark2")

mytheme<-
  theme_minimal()+
  theme(
  panel.background = element_rect(fill = "white", color = NA),
  plot.background = element_rect(fill = "white", color = NA),
  legend.background = element_rect(fill = "white", color = NA),
  text = element_text(family="sans"),
  strip.background = element_rect(
    color = "grey70", 
    linewidth = 1))

options(ggplot2.discrete.colour = palette, ggplot2.discrete.fill = palette)

theme_set(mytheme)

my_ggsave<-function(x, ...){
  
  ggsave(paste0(x,".png"), plot=last_plot(),
            path=here::here("figs"), 
            dpi=300,
            width=ifelse(is.na(...),7, ...))
  
    ggsave(paste0(x,".tiff"), plot=last_plot(), 
            path=here::here("figs"), 
            dpi=300,
            width=ifelse(is.na(...),7, ...))
}

DHIS2 Data Import and Cleaning

DHIS2 Events (Patient visits) with dates between Jan 1 2021 and June 30 2024 were exported.

Patient IDs and other PII were removed. Age was calculated as years between DOB and Registration Date. Only those with with hypertension and consent to record data checked are included. Events that are likely test data or created outside Kano and Ogun are removed.

Code

dhis<-readr::read_csv("output_file.csv") %>% 
  janitor::clean_names()


dhis_clean<-dhis %>%
  mutate(across(contains("_date"), dmy)) %>%  # convert to date everything thats a date
  mutate(across(ends_with("updated_on"), dmy)) %>%  # convert to date everything thats a date
  mutate(across(contains("mmhg"),as.double)) %>% # BP shouldnt be character
  select(!c(program_stage, geometry, longitude, latitude)) %>% #drop unused columns
  arrange(reg_id,event_date) %>% #now make a running tally for each patient visits
  group_by(reg_id) %>%
  mutate("one"=1, "appt_tally"=cumsum(one)) %>% 
  select(-one) %>% 
  ungroup() %>% 
  rename_all(~stringr::str_replace(.,"^htn_","")) %>% # clean names
  # now extract the OU hierarchy
  separate(organisation_unit_name_hierarchy, 
           into=c("Nat","State","LGA","Ward","HC","FACILITY"), sep="/") %>% # clinic hierarchy
  mutate(age=as.integer(newage)) %>% # convert age to number
  filter(!is.na(age) & age>17 & age < 130) %>% # and remove non adults
  filter(does_patient_have_hypertension=="Yes") %>% # remove those without hypertension confirmed
  filter(consent_to_record_data==1 ) %>% # and those without consent to record data
  filter(event_status %in% c("COMPLETED","ACTIVE") ) %>% #remove scheduled events 
  filter(!str_detect(State,"ad ")) %>%   #and any outside K/O
  filter(!str_detect(State,"ab ")) %>%   #and any outside K/O
  filter(!str_detect(created_by,"Test")) # remove test data

Data Quality Check

Skim of data

Looking at all columns in a skim.

Code

dhis_clean %>% skimr::skim()

Data summary
Name	Piped data
Number of rows	165208
Number of columns	61
_______________________
Column type frequency:
character	27
Date	5
logical	15
numeric	14
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
stored_by	49438	0.70	7	17	107
created_by	0	1.00	18	46	106
last_updated_by	0	1.00	18	46	107
organisation_unit_name	0	1.00	19	46	104
Nat	0	1.00	11	11	1
State	0	1.00	15	15	2
LGA	0	1.00	11	43	42
Ward	0	1.00	9	27	100
HC	0	1.00	11	47	105
FACILITY	156493	0.05	23	37	10
program_status	0	1.00	6	9	3
event_status	0	1.00	6	9	2
organisation_unit	0	1.00	11	11	105
med_losartan	106932	0.35	13	14	2
hypertension_treated_in_the_past	1165	0.99	10	23	2
med_hydrochlorothiazide	163799	0.01	24	26	2
other_hypertension_medication	162573	0.02	1	64	126
patient_followup_status	12318	0.93	6	20	3
ncd_patient_status	0	1.00	4	15	5
days_of_medication	137932	0.17	7	7	3
does_patient_have_diabetes	10994	0.93	2	3	2
sex	43	1.00	4	6	2
relationship_to_patient	34371	0.79	3	8	9
newage	0	1.00	2	3	96
med_amlodipine	4308	0.97	10	18	3
med_telmisartan	164758	0.00	17	17	2
does_patient_have_hypertension	0	1.00	3	3	1

Variable type: Date

skim_variable	complete_rate	min	max	median	n_unique
event_date	1	2021-01-01	2024-07-01	2023-01-09	1278
last_updated_on	1	2021-01-01	2024-07-21	2023-06-09	1276
scheduled_date	1	1974-06-12	2201-08-29	2023-03-13	1372
registration_date	1	1972-01-07	2024-07-17	2022-04-25	1324
incident_date	1	2003-12-08	2024-07-17	2022-06-02	1304

Variable type: logical

skim_variable	n_missing	mean	count
organisation_unit_code	165208	NaN	:
blood_sugar_reading_rbs_mgdl	165208	NaN	:
blood_sugar_unit	165208	NaN	:
history_of_heart_attack	165208	NaN	:
history_of_stroke	165208	NaN	:
blood_sugar_reading_fbs_mmoll	165208	NaN	:
blood_sugar_reading_ppbs_mmoll	165208	NaN	:
blood_sugar_reading_ppbs_mgdl	165208	NaN	:
med_glibenclamide	165208	NaN	:
other_diabetes_medication	165207	0	FAL: 1
blood_sugar_reading_rbs_mmoll	165208	NaN	:
type_of_diabetes_measure	165208	NaN	:
blood_sugar_reading_hba1c	165208	NaN	:
blood_sugar_reading_fbs_mgdl	165208	NaN	:
med_metformin	165208	NaN	:

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
height_cm	157928	0.04	86.58	79.29	0.45	1.6	148.0	160.0	758	▇▆▁▁▁
bp_diastole_mmhg	1300	0.99	85.28	239.52	1.00	76.0	82.0	90.0	90102	▇▁▁▁▁
weight_kg	157250	0.05	73.61	71.50	28.00	61.0	70.0	82.0	6298	▇▁▁▁▁
bmi	157923	0.04	136982.95	280270.71	0.00	27.0	38.3	262222.2	4296296	▇▁▁▁▁
bmi_measurement	156527	0.05	1.00	0.00	1.00	1.0	1.0	1.0	1	▁▁▇▁▁
consent_to_send_sms_reminders	138779	0.16	1.00	0.00	1.00	1.0	1.0	1.0	1	▁▁▇▁▁
blood_sugar_measurement	164697	0.00	1.00	0.00	1.00	1.0	1.0	1.0	1	▁▁▇▁▁
bp_systole_mmhg	700	1.00	151.44	1149.85	1.00	128.0	140.0	152.0	207120	▇▁▁▁▁
consent_to_record_data	0	1.00	1.00	0.00	1.00	1.0	1.0	1.0	1	▁▁▇▁▁
reg_id	0	1.00	16122.46	9360.88	0.00	7913.0	16113.0	24221.0	32332	▇▇▇▇▇
has_supporter	0	1.00	0.80	0.40	0.00	1.0	1.0	1.0	1	▂▁▁▁▇
has_phone	0	1.00	0.81	0.39	0.00	1.0	1.0	1.0	1	▂▁▁▁▇
appt_tally	0	1.00	6.26	5.80	1.00	2.0	4.0	9.0	45	▇▂▁▁▁
age	0	1.00	54.11	14.13	18.00	45.0	54.0	64.0	124	▂▇▃▁▁

Clean Registration Dates

Looking at registration dates, seems some registrations are after first visit (“back data entry”).

4% of patients have later registration dates than dates of first visits.

But, registrations should always be before or on first visit date. Registration = entry to program.

This snippet overwrites registration dates to the date of first visit. The operation potentially biases Age variable, which is based on days between date of birth and registration.

Question: How exactly are registration dates used???

Code

# create a new registration date with rules
dhis_clean_newReg<-dhis_clean %>% 
  mutate(reg_date_new=case_when(
    appt_tally==1 & registration_date<=event_date ~ registration_date,
    appt_tally==1 & registration_date>event_date ~ event_date,
    appt_tally>1 ~ NA,
    TRUE ~ NA)) %>% 
  mutate(reg_date_new=if_else(as.integer(as.Date(ymd("2020-01-01"))-registration_date) > 0, 
                              event_date, reg_date_new)) %>% 
  group_by(reg_id) %>%
  fill(reg_date_new, .direction="down") %>% 
  ungroup()

dhis_clean_newReg %>% 
  select(reg_id, registration_date, reg_date_new, appt_tally, event_date) %>% 
  mutate(reg_date_post_ev1=if_else(reg_date_new!=registration_date & appt_tally==1, 1, 0)) %>% 
  filter(appt_tally==1) %>% 
  count(reg_date_post_ev1) %>% 
  mutate(percent=n/sum(n))

# A tibble: 2 × 3
  reg_date_post_ev1     n percent
              <dbl> <int>   <dbl>
1                 0 29554  0.962 
2                 1  1157  0.0377

Code

# Now rewrite the clean dhis dataset. 
# Remove unneeded columns and create days since entry
dhis_clean<-dhis_clean_newReg %>% 
  mutate(registration_date=reg_date_new) %>% 
    mutate(days_since_entry=as.double(event_date-registration_date)) %>%  #days since entry
  select(-reg_date_new) %>% 
  janitor::remove_empty(which = "cols") # remove all unnecessary columns

Code

# dhis_clean<-dhis_clean %>% filter(registration_date >= ymd("2021-01-01"))

Notice the dates and years of events, looks correct.

Code

dhis_clean %>%  
  select(event_date, reg_id) %>%
  distinct() %>% 
  mutate(year=lubridate::year(as.Date(event_date))) %>%
  count(year) %>%   kable() %>% kableExtra::kable_styling()

year	n
2021	24643
2022	55985
2023	57631
2024	25452

Zero ending values

And, 30% of all “Age at Registration” end in 0. (36% of SBPs do as well).

HCWs were using the DOB attribute as An ESTIMATE of Age, instead of as a true DOB. A feature of DHIS2 enables this estimation.

But between the incorrect registration date and using Age/DoB as an estimate, the Age Variable is imprecise and has unknown measurement error.

Code

dhis_clean %>%
  select(newage, reg_id) %>%
    distinct() %>% 
  mutate(age=as.integer(newage)) %>% 
  select(age) %>% 
  filter(age>0) %>% 
## !!!! 30% of all Ages end in 0!!!! 
## Measurement error by entering age instead of DOB
  # mutate(age_zero=if_else(age %% 10==0, 1, 0)) %>% 
  # count(age_zero) %>% 
  # mutate(per=n/sum(n)) 
  ggplot() +
  geom_histogram(aes(x=age)) +
  labs(title="Histogram of Age at Registration for All Visits")

Code

dhis_clean %>%
  mutate(age_zero=if_else(bp_systole_mmhg %% 10==0, 1, 0)) %>%
  filter(99 < bp_systole_mmhg & bp_systole_mmhg<200) %>%
  # count(age_zero) %>%
  # mutate(per=n/sum(n))
  ggplot() +
  geom_histogram(aes(x=bp_systole_mmhg)) +
  labs(title="Histogram of SBP All Visits",
       subtitle = "SBPs less than 200. Spikes are values ending in zero")

Duplicate events?

It looks like a number of visits occur on the same day for many patients.

Code

## Found duplicate events!!!
tots_d <- dhis_clean %>% 
  mutate(reg_visit_id=paste0(reg_id,"_",appt_tally)) %>% 
  group_by(reg_id) %>% 
  mutate(prev_event_date = lag(event_date, n=1, order_by=event_date)) %>% 
  mutate(lead_event_date = lead(event_date, n=1, order_by=event_date)) %>% 
  ungroup()


### highlight them as dupes
dupes <- tots_d %>% 
  # select(reg_id, event_date, prev_event_date, organisation_unit_name) %>% 
  # ungroup() %>% 
  mutate(dupe=if_else(event_date==prev_event_date | event_date==lead_event_date, 1, 0)) %>%
  group_by(reg_id) %>% 
  # fill(dupe, .direction="down") %>%
  filter(dupe==1) %>% arrange(reg_id,event_date)



## TRUE dupes also have identical dates except "last updated" date
dupes_true <- dupes %>% 
  ungroup() %>% 
  select(!c(lead_event_date, prev_event_date, last_updated_by)) %>% 
  ## this only keeps rows that are identical except for the appointment tally and visit id 
  janitor::get_dupes(!c(appt_tally, reg_visit_id))

print(paste0("total patients ", length(unique(tots_d$reg_id))))

[1] "total patients 30714"

Code

print(paste0("patient records with dupe events ", length(unique(dupes$reg_id))))

[1] "patient records with dupe events 1122"

Code

print(paste0("duped events ", length(dupes$reg_id)))

[1] "duped events 2856"

Code

print(paste0("patient records with true dupe events on same day ", length(unique(dupes_true$reg_id))))

[1] "patient records with true dupe events on same day 151"

Code

# length(unique(dhis_subset$reg_id))


# Only the first of all "true dupe" events should still be kept!
dupes_true_extras<-dupes_true %>% 
      group_by(reg_id) %>% 
      mutate("one"=1, "dupe_visit_id"=cumsum(one)) %>% 
      select(-one) %>% 
      select(reg_id,reg_visit_id, dupe_visit_id) %>% 
      filter(dupe_visit_id>1)

## The new dataset with totals is called tots and has removed the "true dupe" visits
tots <-tots_d %>% 
      anti_join(dupes_true_extras, by="reg_visit_id")

# (1122-188)/length(unique(dhis_clean$reg_id))

1122 of 30714 patient records reported at least one pair of visits on the same day ( 2856 events total). Of these, 151 patients had “true duplicates” where data reported within each visit was identical.

NEXT STEP for dupes would be to pivot longer, then find those variables that are NOT identical within Dupes

This table shows us the FIELDS which do not match within a pair of duplicated events for the same registration.

Code

d_long <-dupes %>% 
  group_by(reg_id) %>% 
  arrange(event_date, last_updated_on) %>% 
  mutate("one"=1, "dupe_id"=cumsum(one)) %>% 
  select(-one) %>% 
  ungroup() 

dupe_clash <-d_long %>% 
    filter(dupe_id<3) %>% # just those regs with exactly 2 dupe events
    mutate(across(everything(), as.character)) %>% # no dates or appt counts
    select(!ends_with("_date"), -appt_tally, -reg_visit_id) %>% 
    pivot_longer(!c(reg_id, dupe_id), names_to="var",values_to="value") %>% 
    pivot_wider(names_from=dupe_id,
                names_prefix = "dupe_",
                values_from=c(value)) %>% 
    # filter(!is.na(dupe_1)) %>%
    mutate(match=case_when(
      is.na(dupe_1) & is.na(dupe_2) ~ TRUE,
      dupe_1==dupe_2 ~ TRUE,
      TRUE ~ FALSE)) %>%
    filter(match==FALSE) %>% 
    arrange(reg_id)

dupe_clash %>% 
  count(var, sort=T)

# A tibble: 24 × 2
   var                    n
   <chr>              <int>
 1 last_updated_on      516
 2 bp_diastole_mmhg     401
 3 bp_systole_mmhg      395
 4 med_losartan         152
 5 med_amlodipine       151
 6 event_status          89
 7 days_of_medication    76
 8 created_by            66
 9 stored_by             65
10 bmi_measurement       37
# ℹ 14 more rows

Last Updated date could be helpful for deduping, same as stored_by, created_by, and event status (COMPLETED locks the values, so if there is an error there, another one could be made on same place).

But if those are not obvious for which one should be used, then we would use the non null BP.

Two visits on same day with non null BP values will be difficult to merge automatically.

THE RULES FOR MERGING TWO PATIENT VISITS ON SAME DATE

If one of the events does not have a BP reading, all values in the event with BP reading take priority (if any duped event has all BP NA, its removed from dataset).
ACTIVE status events are generally opened after COMPLETED events to make a change. If its the only ACTIVE event that date, use this event. Priority = 3 (max)
If other clashes, count the number of nulls for medication. If one of the events has least med nulls, then use that event. Priority = 2
If still cannot reconcile, use the first event by date last updated (or, if last updated on same day, random). Priority = 1

Code

# this chunk will apply the priority rules to flag priority events
d_long_priorities <- d_long %>% 
    filter(!is.na(bp_diastole_mmhg) & !is.na(bp_systole_mmhg)) %>% 
    group_by(reg_id, event_date, event_status) %>% 
    mutate(dupes_evs_with_this_status=n()) %>% ungroup() %>% 
    mutate(dupe_meds_null_count = rowSums(is.na(across(contains("med"))))) %>% 
    group_by(reg_id, event_date) %>% 
    mutate(dupes_evs_mostMeds=min(dupe_meds_null_count)) %>% 
    mutate(dupes_evs_withMedCount=n()) %>%
      mutate(dupe_id_priority =
      case_when(
        dupes_evs_with_this_status==1 & event_status=="ACTIVE" ~ 3,
        dupes_evs_mostMeds==dupe_meds_null_count & dupes_evs_withMedCount==1 ~ 2,
        dupe_id==min(dupe_id) ~ 1,
          TRUE ~ 0
        ) ) %>% ungroup()

# WE WANT TO KEEP ALL THESE AND REMOVE OTHER DUPES
d_long_keepers <-d_long_priorities %>% 
       group_by(reg_id, event_date) %>% 
       filter(dupe_id_priority==max(dupe_id_priority)) %>% ungroup()

d_long_remove <- d_long %>% 
  anti_join(d_long_keepers, by="reg_visit_id")

# d_long_remove %>% colnames()

d_long_keepers %>% 
  janitor::tabyl(dupe_id_priority) %>% 
  kable() %>% kableExtra::kable_styling()

dupe_id_priority	n	percent
1	1195	0.8871566
2	104	0.0772086
3	48	0.0356347

We remove 1509 events from the main dataset and then recreate the appointment tally and visit IDs.

Code

tots<-tots %>% 
  anti_join(d_long_remove, by="reg_visit_id") %>% 
# REDO THE APPT TALLY AND REG_VISIT_ID now that the dupes have been removed
  arrange(reg_id,event_date) %>% #make a running tally for each patient visits
  group_by(reg_id) %>%
  mutate("one"=1, "appt_tally"=cumsum(one)) %>% 
  select(-one) %>% ungroup() %>% 
  mutate(reg_visit_id=paste0(reg_id,"_",appt_tally)) %>% 
  group_by(reg_id) %>% 
  add_count(name="total_patient_visits") %>% # counter for total patients visits
  mutate(prev_event_date = lag(event_date, n=1, order_by=event_date)) %>% 
  mutate(lead_event_date = lead(event_date, n=1, order_by=event_date)) %>% 
  ungroup() #and need to add new lead lag dates

Description

Table 1 Zero Followup Visits After Baseline

Process it for the Table1 all registrations up to April 1 2024.

Thus all patients have at least 90 days for a follow up Visit after Baseline Visit.

“>=1 Visits Post Baseline” here means at least 1 visit occurs later. The second visit can be within or after 90 days from baseline.

Code

dhis_subset <- tots %>% 
    filter(registration_date >= dmy("01-01-2021")) %>% 
    filter(registration_date <= dmy("01-04-2024"))

dhis_table <- dhis_subset %>%  
    mutate(patient_returned=if_else(total_patient_visits>1, 
                                    ">=1 Visits Post Baseline",
                                    "0 Visits Post Baseline"
                                    )) %>% 
    mutate(bp_systole_mmhg=if_else(bp_systole_mmhg>400, 400, bp_systole_mmhg)) %>% 
    mutate(sbp_level =
      case_when(
        bp_systole_mmhg >= 400 |  bp_systole_mmhg < 50 ~ "error (>300 or < 50)",
        bp_systole_mmhg >= 180 & bp_systole_mmhg < 400  ~ "emergency [180-300)",
        bp_systole_mmhg >= 140 & bp_systole_mmhg < 180  ~ "very high [140-180)",
        bp_systole_mmhg >= 120 & bp_systole_mmhg < 140 ~ "high or elevated (120-140)",
        is.na(bp_systole_mmhg) ~ NA,
        TRUE ~ "normal")
    ) %>% 
  mutate(sbp_level=fct_relevel(sbp_level, 
                              c("normal","high or elevated (120-140)",
                               "very high [140-180)","emergency [180-300)", 
                               "error (>300 or < 50)"))) %>% 
  mutate(has_diabetes=as.factor(does_patient_have_diabetes)) %>%
  mutate(has_phone=as.factor(has_phone)) %>%
  mutate(has_supporter=as.factor(has_supporter)) %>%
  mutate(age_group = as.factor(cut(age,
                         breaks = c(0, 30, 45, 60, 75, 110, 100000),
                         include.lowest = T,
                         labels = c("Other", "30-44","45-59", 
                                    "60-74", "75-110", "Other")))) %>%
  mutate(age_group=fct_shift(age_group, n= 1)) %>%
  select(patient_returned, State, sex, age, age_group, appt_tally,
         "sbp_baseline"=sbp_level, bp_systole_mmhg,
         "previous_htn_treated"=hypertension_treated_in_the_past,
         "amlodipine_baseline"=med_amlodipine,
         has_phone, has_supporter, has_diabetes,days_of_medication,
         total_patient_visits) %>% 
  filter(appt_tally==1)


## Write it to RDS for the manuscript
write_rds(dhis_table, file=here::here("data","dhis_table.rds"))


tab1<-table1::table1(~ sex + State + age + age_group + previous_htn_treated + 
                 sbp_baseline + bp_systole_mmhg + amlodipine_baseline + has_phone + 
                 has_supporter + has_diabetes +days_of_medication + total_patient_visits
                 | factor(patient_returned),
               data=dhis_table)

# t1flex(tab1)
tab1

	>=1 Visits Post Baseline (N=21183)	0 Visits Post Baseline (N=7044)	Overall (N=28227)
sex
Female	16005 (75.6%)	5032 (71.4%)	21037 (74.5%)
Male	5171 (24.4%)	2008 (28.5%)	7179 (25.4%)
Missing	7 (0.0%)	4 (0.1%)	11 (0.0%)
State
kn Kano State	13042 (61.6%)	2910 (41.3%)	15952 (56.5%)
og Ogun State	8141 (38.4%)	4134 (58.7%)	12275 (43.5%)
age
Mean (SD)	53.7 (14.6)	53.9 (15.6)	53.8 (14.8)
Median [Min, Max]	53.0 [18.0, 124]	53.0 [18.0, 122]	53.0 [18.0, 124]
age_group
30-44	5111 (24.1%)	1682 (23.9%)	6793 (24.1%)
45-59	8738 (41.3%)	2926 (41.5%)	11664 (41.3%)
60-74	4755 (22.4%)	1494 (21.2%)	6249 (22.1%)
75-110	1253 (5.9%)	451 (6.4%)	1704 (6.0%)
Other	1326 (6.3%)	491 (7.0%)	1817 (6.4%)
previous_htn_treated
First time	17400 (82.1%)	5632 (80.0%)	23032 (81.6%)
Was treated in the past	3632 (17.1%)	1323 (18.8%)	4955 (17.6%)
Missing	151 (0.7%)	89 (1.3%)	240 (0.9%)
sbp_baseline
normal	182 (0.9%)	43 (0.6%)	225 (0.8%)
high or elevated (120-140)	1293 (6.1%)	302 (4.3%)	1595 (5.7%)
very high [140-180)	17547 (82.8%)	5701 (80.9%)	23248 (82.4%)
emergency [180-300)	2082 (9.8%)	951 (13.5%)	3033 (10.7%)
error (>300 or < 50)	13 (0.1%)	5 (0.1%)	18 (0.1%)
Missing	66 (0.3%)	42 (0.6%)	108 (0.4%)
bp_systole_mmhg
Mean (SD)	158 (17.9)	161 (18.8)	159 (18.1)
Median [Min, Max]	158 [14.0, 400]	160 [14.0, 400]	159 [14.0, 400]
Missing	66 (0.3%)	42 (0.6%)	108 (0.4%)
amlodipine_baseline
Amlodipine 10mg OD	1802 (8.5%)	802 (11.4%)	2604 (9.2%)
Amlodipine 5mg OD	19175 (90.5%)	6158 (87.4%)	25333 (89.7%)
Other drug	3 (0.0%)	0 (0%)	3 (0.0%)
Missing	203 (1.0%)	84 (1.2%)	287 (1.0%)
has_phone
0	3994 (18.9%)	1359 (19.3%)	5353 (19.0%)
1	17189 (81.1%)	5685 (80.7%)	22874 (81.0%)
has_supporter
0	4217 (19.9%)	1593 (22.6%)	5810 (20.6%)
1	16966 (80.1%)	5451 (77.4%)	22417 (79.4%)
has_diabetes
No	19050 (89.9%)	6253 (88.8%)	25303 (89.6%)
Yes	461 (2.2%)	142 (2.0%)	603 (2.1%)
Missing	1672 (7.9%)	649 (9.2%)	2321 (8.2%)
days_of_medication
30 days	2395 (11.3%)	1021 (14.5%)	3416 (12.1%)
60 days	27 (0.1%)	14 (0.2%)	41 (0.1%)
90 days	0 (0%)	1 (0.0%)	1 (0.0%)
Missing	18761 (88.6%)	6008 (85.3%)	24769 (87.7%)
total_patient_visits
Mean (SD)	6.87 (5.72)	1.00 (0)	5.40 (5.57)
Median [Min, Max]	5.00 [2.00, 45.0]	1.00 [1.00, 1.00]	3.00 [1.00, 45.0]

Count of patients returned after first visit.

Code

dhis_table %>% 
  count(patient_returned) %>% 
  mutate(pt=n/sum(n))

# A tibble: 2 × 3
  patient_returned             n    pt
  <chr>                    <int> <dbl>
1 0 Visits Post Baseline    7044 0.250
2 >=1 Visits Post Baseline 21183 0.750

Visits per patient

Explore the data. Start by counting the number of visits per patient.

This shows patients registered before a given date.

Code

tots_small <-tots %>% 
  filter(registration_date >= ymd("2021-01-01")) %>%
  select(reg_id, total_patient_visits) %>% 
  unique()


seq_visits<-seq(ymd('2021-01-01'),ymd('2024-01-01'), by = '6 months')


seq_func<-function(vec){
  tots %>% 
  ungroup() %>% 
  filter(registration_date <= ymd(vec)) %>%
  select(reg_id, total_patient_visits) %>% 
  unique() %>% 
  count(total_patient_visits) %>% 
  ggplot() +
  geom_col(aes(y=n, x=total_patient_visits)) +
    labs(title="Patient visits by cohort",
      subtitle=str_wrap(paste0("Registered Before ", 
                               as.character(vec)),20))
}

patchwork::wrap_plots(lapply(seq_visits, seq_func))

Code

df<-data.frame(x=seq(1:40))
df$y<-8100*df$x^-1



tots_small %>%
  count(total_patient_visits) %>% 
  left_join(df, by=c("total_patient_visits"="x")) %>% 
  ggplot(aes(x=total_patient_visits)) +
  geom_col(aes(y=n), alpha=0.6) +
  geom_line(aes(y=y), color="blue", linetype=3, linewidth=0.8) +
  scale_x_continuous(limits=c(0,35)) +
  labs(title="Patients by total visit count",
       subtitle="NHCI Registrations Jan 1 2021 - Jul 1 2024",
       x="Visit count",
       y="Total patients") +
  annotate("rect", xmin=17, xmax=32, ymin=2000, ymax=4000, color="blue",fill="white", linetype=2) +
  annotate("text", x=25, y=3500, label="Decay Function: y ~ a * x^ -1", size=5, color="blue") +
  annotate("text", x=25, y=2500, label="where a = Count with 1 Visit", size=5, color="blue") +
  expand_limits(x = 0, y = 0) +
    coord_cartesian(expand = FALSE, clip = "off")

Code

my_ggsave("fig1_visitCounts",12)

Code

count(tots_small, total_patient_visits) %>% 
  ungroup() %>% 
  mutate(percent=round(100*(n/sum(n)),4)) %>% DT::datatable(rownames = FALSE)

Code

tots_small %>% 
      mutate(count_level =
        case_when(
          total_patient_visits==1 ~ "01 visit",
          total_patient_visits>=2 & total_patient_visits < 4  ~ "02-03 visits",
          total_patient_visits>=4 & total_patient_visits < 11  ~ "04-10 visits",
          total_patient_visits>=4 & total_patient_visits < 11  ~ "04-10 visits",
          total_patient_visits>=11 & total_patient_visits < 46  ~ "11-45 visits",
          TRUE ~ NA
        )
      ) %>% 
    count(count_level) %>% 
    mutate(percent=100*(n/sum(n))) %>% kable() %>% kableExtra::kable_styling()

count_level	n	percent
01 visit	8105	27.31992
02-03 visits	8022	27.04015
04-10 visits	9346	31.50302
11-45 visits	4194	14.13692

Code

#72816/126422
## top 20% of attendees by visit count, attended 58% of all visits.

Investigate NCD Patient Status variable (patient follow up tracker..)

Code

# tots %>% 
#   distinct(reg_id, ncd_patient_status) %>% 
#   count(ncd_patient_status)

tots %>% 
  count(ncd_patient_status) %>% 
  mutate(pct=round(n/sum(n),4)*100) %>% kable() %>% kableExtra::kable_styling()

ncd_patient_status	n	pct
Active record	162166	99.07
Died	457	0.28
Inactive record	433	0.26
Transferred	569	0.35
Unresponsive	69	0.04

Code

# length(unique(tots$reg_id))

# 1100/30704

Narrow in on registrations after jan 1 2023, can see a dropoff of attendance by sex.

Code

tots %>% 
  ungroup %>% 
  arrange(desc(registration_date)) %>% 
  filter(registration_date>=ymd("2023-01-01") &
         registration_date<ymd("2024-01-01") ) %>% 
  filter(!is.na(sex)) %>% 
    # mutate(n_group =
    #   case_when(
    #     total_patient_visits==1 ~ "01 (Entry)",
    #     total_patient_visits==2 ~ "02",
    #     total_patient_visits >1 & total_patient_visits<=5 ~ "03-06",
    #     total_patient_visits >5 & total_patient_visits<=11 ~ "07-11",
    #     TRUE ~ "12+"
    #   )
    # ) %>% 
  ungroup() %>% 
  filter(total_patient_visits==appt_tally & appt_tally<25) %>% 
  group_by(sex, appt_tally) %>% 
  summarise(count_by_ngroup=n()) %>%  
  mutate(percent= prop.table(count_by_ngroup)*100) %>% 
  mutate(cumulative_percent=cumsum(percent)) %>% 
  ungroup() %>% 
  as_tibble() %>% 
  ggplot()+
  geom_line(aes(x=appt_tally,y=percent, color=sex)) +
  # geom_col(aes(x=appt_tally,y=percent, fill=sex), position="dodge") +
  # facet_grid(~sex, scales="fixed") +
  labs(title="Patients by Sex and Total Number of Visits",
       subtitle="Registrations in 2023 counting All Visits to April 1, 2024",
       x="Total Number of Patient Visits in Time Period")+
  # theme_classic()+
  theme(legend.position = "bottom") +
  scale_x_continuous(breaks = seq(1, 22, by = 1))

Code

tots %>% count(LGA) %>% nrow()

[1] 42

Monthly patient load per clinic

Code

dhis_pts_per_month<-tots %>% 
  mutate(appt_month=lubridate::month(event_date)) %>% # transform to month
  group_by(appt_month, organisation_unit) %>% 
  summarize(patients=n_distinct(reg_id),
            appts=n()) %>% 
  ungroup() %>% 
  mutate(appt_patient_ratio=appts/patients)

dhis_pts_per_month %>% 
  summarytools::descr()

Descriptive Statistics  
dhis_pts_per_month  
N: 1242  

                    appt_month   appt_patient_ratio     appts   patients
----------------- ------------ -------------------- --------- ----------
             Mean         6.49                 1.21    131.80     103.45
          Std.Dev         3.45                 0.16    183.67     136.95
              Min         1.00                 1.00      1.00       1.00
               Q1         3.00                 1.10     41.00      35.00
           Median         6.00                 1.18     74.00      61.00
               Q3         9.00                 1.29    131.00     108.00
              Max        12.00                 2.08   1287.00     990.00
              MAD         4.45                 0.13     57.82      45.96
              IQR         5.75                 0.19     90.00      72.75
               CV         0.53                 0.13      1.39       1.32
         Skewness         0.00                 1.40      3.33       3.46
      SE.Skewness         0.07                 0.07      0.07       0.07
         Kurtosis        -1.22                 3.24     12.12      13.64
          N.Valid      1242.00              1242.00   1242.00    1242.00
        Pct.Valid       100.00               100.00    100.00     100.00

Code

# %>% summarize(patient_avg=mean(patients),
#             appt_avg=mean(appts))

dhis_pts_per_month %>%
  ggplot()+
  geom_histogram(aes(x=appts),bins=30) +
  labs(title="Total appointments per clinic per month")

Days & months enrolled per patient

Days enrolled per patient = Days since entry at latest event + 90

Code

dpv<-tots %>% 
  select(reg_id, appt_tally, total_patient_visits, registration_date, 
         event_date, days_since_entry) %>% 
  filter(appt_tally==total_patient_visits & days_since_entry>=0) %>% 
  mutate(days_tracked=days_since_entry+90) %>% 
  mutate(patient_days_per_visit=days_tracked/total_patient_visits)

summary(dpv$patient_days_per_visit)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   5.294   58.000   90.000   94.394   90.600 1370.000

Code

# tots %>% 
#   filter(total_patient_visits>40) %>% View()

dpv %>% 
  mutate(days_enrolled=if_else(total_patient_visits==appt_tally, 
                               event_date-registration_date, NA)) %>% 
  filter(!is.na(days_enrolled) & total_patient_visits>1) %>%
  # filter(patient_days_per_visit>=0 & patient_days_per_visit< 91) %>% 
  mutate(months_per_visit=patient_days_per_visit/30) %>% 
  ggplot() +
  geom_histogram(aes(x=months_per_visit)) +
  labs(title="Months Per Visit, By Patient",
       subtitle="Total number of months enrolled divided by number of visits")

Cumulative density of days since Last visit

Code

tots %>% 
  filter(appt_tally!=1) %>% 
  mutate(days_since_last=as.integer(event_date-prev_event_date)) %>% 
  filter(days_since_last<50) %>%
  # count(days_since_last) %>% 
  ggplot(aes(x=days_since_last)) + 
  geom_step(aes(y=after_stat(y)),stat="ecdf") +
  labs(title="Cumulative Density of Events by days since latest visit",
       subtitle = "All visits occuring less than 50 days from last")

Code

# 
#   ggplot() +
#   geom_histogram(aes(x=days_since_last))

Code

tots %>% 
    filter(appt_tally>1) %>%
      mutate(days_since_last=event_date-prev_event_date) %>% 
  # filter(days_since_last>7) %>% 
      mutate(days_since_level =
        case_when(
          days_since_last ==0 ~ "_Same Day",
          days_since_last>=1 & days_since_last < 8  ~ "01-07 Days",
          days_since_last>=8 & days_since_last < 21  ~ "08-20 Days",
          days_since_last>=21 & days_since_last < 27  ~ "21-26 Days",
          days_since_last>=27 & days_since_last < 37  ~ "27-36 Days",
          days_since_last>=37 & days_since_last < 90  ~ "37-89 Days",
          days_since_last>=90  ~ "90+ Days",
          TRUE ~ "_Same Day"
        )
      ) %>% 
    count(days_since_level) %>% 
    mutate(percent=100*(n/sum(n))) %>% 
    ggplot(aes(x=days_since_level,y=percent))+
    geom_col() +
    
    # geom_text(aes(label = round(percent,2)), color="white") +
    labs(title="Follow Up Visits: Days After Previous Visit",
         subtitle="Visits 2021-2024, Excluding Duplicate Visits",
         x="",y="Percent of Follow Up visits") +coord_flip()

Define the outcome event: Attrition

No return visit after entry visit
No return within 1-90 days of a visit (Loose Window)
No return within 1-37 days of any visit (Strict Window)
No return within 365 days of latest visit (Original Window)

Patient status

Here we check what patient statuses follow the latest visit.

Code

tots %>% 
  filter(appt_tally==total_patient_visits) %>%
  count(patient_followup_status, ncd_patient_status)

# A tibble: 19 × 3
   patient_followup_status ncd_patient_status     n
   <chr>                   <chr>              <int>
 1 Called                  Active record        143
 2 Called                  Died                   2
 3 Called                  Inactive record        9
 4 Called                  Transferred            4
 5 Overdue Pending Call    Active record      24293
 6 Overdue Pending Call    Died                  72
 7 Overdue Pending Call    Inactive record      139
 8 Overdue Pending Call    Transferred          116
 9 Overdue Pending Call    Unresponsive          20
10 Upcoming visit          Active record       2535
11 Upcoming visit          Died                   4
12 Upcoming visit          Inactive record        2
13 Upcoming visit          Transferred            5
14 Upcoming visit          Unresponsive           2
15 <NA>                    Active record       3323
16 <NA>                    Died                  12
17 <NA>                    Inactive record       11
18 <NA>                    Transferred           17
19 <NA>                    Unresponsive           3

How to interpret CANCELLED or COMPLETED registrations remains a question.

Scheduled and Actual Visits

Days between actual visit date and scheduled date

Code

tots %>% 
  ungroup() %>%
  filter(appt_tally!=1) %>%
  mutate(days_from_sched=as.numeric(event_date-scheduled_date)) %>% 
    mutate(days_from_sched_range =
      case_when(
        days_from_sched < 0 ~ "Earlier_than_scheduled",
        days_from_sched == 0 ~ "0_Same_Day",
        days_from_sched >= 1 & days_from_sched  <= 7 ~ "1.0_Day_to_1_Week",
        days_from_sched >= 8 & days_from_sched  <= 28 ~ "1.1_to_4_Weeks",
        days_from_sched >= 29 & days_from_sched  <= 8*12 ~ "4.1_to_12_Weeks",
        TRUE ~ "Other"
      )
    ) %>% 
  # count(days_from_sched_range) %>% 
  janitor::tabyl(days_from_sched_range) %>% 
  kable() %>% kableExtra::kable_styling()

days_from_sched_range	n	percent
0_Same_Day	45094	0.3390985
1.0_Day_to_1_Week	12627	0.0949527
1.1_to_4_Weeks	11587	0.0871321
4.1_to_12_Weeks	11844	0.0890647
Earlier_than_scheduled	43740	0.3289167
Other	8090	0.0608353

Most visits are on the same day as they were scheduled (spontaneous visits?) or earlier.

Code

tots %>% 
  ungroup() %>%
  filter(appt_tally!=1) %>%
  mutate(days_from_latest_schedDate=as.numeric(scheduled_date-prev_event_date)) %>% 
  mutate(days_from_latest_eventDate=as.integer(event_date-prev_event_date)) %>% 
  filter(days_from_latest_eventDate<60 & days_from_latest_eventDate>0) %>%
    filter(days_from_latest_schedDate<60& days_from_latest_schedDate>0) %>% 
  select(reg_visit_id, starts_with("days_from")) %>% 
  pivot_longer(cols=starts_with("days_from"), names_to="property", values_to="value") %>% 
ggplot(aes(x=value, fill=property)) +
  # facet_wrap(~property) +
  geom_histogram(position="dodge",alpha=0.4, bins=50) +
  labs(title="Histogram of Follow Up visits",
  subtitle=str_wrap("by days between scheduled visit date and previous visit date, plus days between current visit date and previous visit date", 90),
       caption="Removing values < 0 or > 60") +
  theme(legend.position = "bottom")

Code

tots %>% 
  ungroup() %>%
  filter(appt_tally!=1) %>%
  mutate(days_from_latest_schedDate=as.numeric(scheduled_date-prev_event_date)) %>% 
  mutate(days_from_latest_eventDate=as.integer(event_date-prev_event_date)) %>% 
  # filter(days_from_latest_eventDate<90 & days_from_latest_eventDate>0) %>%
    filter(days_from_latest_schedDate>0) %>%
  ggplot() + 
  geom_step(aes(x=days_from_latest_eventDate,y=after_stat(y), color="Event Date"),
            stat="ecdf", linewidth=1, alpha=0.6) +
  geom_step(aes(x=days_from_latest_schedDate,y=after_stat(y), color="Scheduled Date"),
            stat="ecdf", linetype=3, linewidth=1) +
  labs(title="Followup visit after previous: by event date and scheduled date",
       subtitle="Cumulative incidence of followup visits by days from previous",
       x="Days since previous visit",
       y="") +
      scale_color_manual(name = "Visit feature",
                         values = c("Event Date" = "black", #match the name of the element to the name in the aes
                                    "Scheduled Date" = "blue")) +
  coord_cartesian(xlim = c(0, 110), ylim=c(0,1)) +
  scale_x_continuous(breaks = seq(0, 105, by = 15)) +
  theme(legend.position ="bottom")

Code

my_ggsave("fig2_incidence_after_previous",10)

53% of followup visits are scheduled to occur 27-36 days after the previous visit, with 38% on precisely 28 days later (the system default scheduled date). 44% of followup visits actually occur within that window.

NEXT STEPS:

Create the windows for timeliness for each subsequent visit
Per visit-opportunity analysis of timeliness

Timely Windows

Code

tots_timely<-tots %>% 
  ungroup() %>% 
  filter(appt_tally==1 | event_date-prev_event_date>=7) %>% # remove visits within 7 days of last
  mutate(nextdate_37=event_date+37, # create 37 and 90 day windows
         nextdate_90=event_date+90,
         nextdate_37_kept=if_else(lead_event_date<=nextdate_37, 1, 0),
         nextdate_90_kept=if_else(lead_event_date<=nextdate_90, 1, 0)) %>% 
  mutate(across(contains("kept"), ~replace(., is.na(.), 0))) %>% 
  select(sex, State,appt_tally, total_patient_visits, ncd_patient_status,
         contains("reg"),contains("event_date"), contains("nextdate"), everything())

Adding month and year variables to the expected window for subsequent visits, and checking whether patient attended their next visit within that window.

Code

tots_timely<-tots_timely %>% 
  mutate(exp_year=lubridate::year(nextdate_90)) %>% 
  mutate(exp_month=lubridate::month(nextdate_90)) %>% 
  mutate(exp_year_month=make_date(year=exp_year, month=exp_month, day="01")) %>% 
  mutate(days_since_last=as.integer(event_date-prev_event_date))

Outcome Events and Censoring Events

This next part is crucial.

The FIRST time a patient exceeds 90 days before the next visit, they are LTFU at that 90 day mark.

This is the OUTCOME EVENT. Subsequent visits for that patient would be discarded.

IFF a patient is marked TRANSFERRED or DEATH, then they are CENSORED after last visit. IFF the last expected visit is after July 1, then they are CENSORED after last visit.

Code

tots_timely<-tots_timely %>% 
  arrange(reg_id, appt_tally) %>% 
  mutate(nextdate_90_dropout=if_else(nextdate_90_kept==0, 1, NA)) %>% 
  group_by(reg_id) %>% 
  fill(nextdate_90_dropout, .direction="down") %>% 
  mutate(nextdate_90_dropout=replace_na(nextdate_90_dropout, 0)) %>% 
  mutate(visits_since_dropout=cumsum(nextdate_90_dropout)) %>% # visit before dropout==1
  ungroup() %>% 
  mutate(censor =
    case_when(
      appt_tally==total_patient_visits & ncd_patient_status %in% c("Transferred","Died") ~ 1,
      nextdate_90 > ymd("2024-07-01") ~ 1,
      TRUE ~ 0
    )
  )

tots_timely %>% 
  select(reg_id, event_date, days_since_last, appt_tally, visits_since_dropout,
         contains("nextdate_90"), censor) %>% tail(15)

# A tibble: 15 × 9
   reg_id event_date days_since_last appt_tally visits_since_dropout nextdate_90
    <dbl> <date>               <int>      <dbl>                <dbl> <date>     
 1  32329 2022-10-12              21          6                    0 2023-01-10 
 2  32329 2022-11-17              36          7                    0 2023-02-15 
 3  32329 2023-01-15              59          8                    0 2023-04-15 
 4  32329 2023-03-20              64          9                    1 2023-06-18 
 5  32329 2023-08-23             156         10                    2 2023-11-21 
 6  32329 2024-02-19             180         11                    3 2024-05-19 
 7  32331 2023-02-09              NA          1                    0 2023-05-10 
 8  32331 2023-03-13              32          2                    0 2023-06-11 
 9  32331 2023-05-31              79          3                    0 2023-08-29 
10  32331 2023-07-17              47          4                    0 2023-10-15 
11  32331 2023-08-21              35          5                    0 2023-11-19 
12  32331 2023-09-25              35          6                    0 2023-12-24 
13  32331 2023-12-18              84          7                    1 2024-03-17 
14  32332 2022-03-31              NA          1                    1 2022-06-29 
15  32332 2022-09-29             182          2                    2 2022-12-28 
# ℹ 3 more variables: nextdate_90_kept <dbl>, nextdate_90_dropout <dbl>,
#   censor <dbl>

Code

tots_timely %>% 
  count(censor) %>% kable() %>% kableExtra::kable_styling()

censor	n
0	150019
1	12319

Code

# 12292/(149856+12292)

# tots_timely %>% 
#   filter(visits_since_dropout < 2) %>%
#   distinct(reg_id)

We can now calculate the RATES of attrition.

Attrition Rate charts

This removes visits that are censored or after dropout.

Code

tots_timely_rates_prep<-tots_timely %>% 
   filter(censor==0) %>% 
   filter(!is.na(sex)) %>% 
   filter(visits_since_dropout<2) 
#dropout occurs only once.... and if censored, not included in denom


find_dropout_rates<-function(data, ...){
  data %>% 
   group_by(exp_year_month, ...) %>% 
   count(nextdate_90_kept) %>% 
   mutate(rate= n/sum(n)) %>% 
   ungroup()
}

tots_timely_rates_all<-tots_timely_rates_prep %>% 
  find_dropout_rates()

tots_timely_rates_all %>% 
  group_by(nextdate_90_kept) %>% 
  summarise(avg_monthly_rate=mean(rate)) %>% kable() %>% kableExtra::kable_styling()

nextdate_90_kept	avg_monthly_rate
0	0.2619861
1	0.7380139

The average monthly INCIDENCE RATE of attrition events (dropouts) is 26%

Now further breakdown of dropout incidence by categories

Code

tots_timely_rates_ss<-tots_timely_rates_prep %>% 
    find_dropout_rates(sex, State)

tots_timely_rates_appt_tally<-tots_timely_rates_prep %>% 
  filter(appt_tally<6) %>% 
  mutate(appt_tally=as.factor(appt_tally)) %>% 
  find_dropout_rates(appt_tally)

tots_timely_rates_state_appt_tally<-tots_timely_rates_prep %>% 
  filter(appt_tally<5) %>% 
  mutate(appt_tally=as.factor(appt_tally)) %>% 
  find_dropout_rates(appt_tally, State)

tots_timely_rates_hc <-tots_timely_rates_prep %>% 
    add_count(exp_year_month, LGA, HC) %>% 
    filter(n>9) %>% 
    find_dropout_rates(State, LGA, HC)

tots_timely_rates_lga <-tots_timely_rates_prep %>% 
    add_count(exp_year_month, LGA) %>% 
    filter(n>9) %>% 
    find_dropout_rates(State, LGA)

Code

ggcoeff<-10000
ggbarscol<-RColorBrewer::brewer.pal(4,"Dark2")[3]
gglinecol<-RColorBrewer::brewer.pal(4,"Dark2")[1]

attr_90day_withRegs<-tots_timely_rates_prep %>% 
  filter(registration_date<=ymd("2024-06-30")) %>% 
  # filter(registration_date>=ymd("2021-04-01")) %>%
  # filter(event_date>=ymd("2021-04-01")) %>%
  filter(exp_year_month<=ymd("2024-06-30")) %>% 
  mutate(reg_year=lubridate::year(registration_date)) %>% 
  mutate(reg_month=lubridate::month(registration_date)) %>% 
  mutate(reg_year_month=make_date(year=exp_year, 
                                      month=exp_month, day="01")) %>% 
  add_count(reg_year_month,name="new_registrations") %>% 
  find_dropout_rates(reg_year_month, new_registrations) %>% 
  filter(nextdate_90_kept==0) %>% 
  ggplot() +
  geom_col(aes(x=reg_year_month, y=new_registrations/ggcoeff),
           fill=ggbarscol, alpha=0.5) +
  geom_line(aes(x=exp_year_month, y=rate), col=gglinecol,stat="identity") +
  labs(title="90-day attrition rate with monthly registrations",
       subtitle="Attrition rate is by month at end of 90-day return window") +
  labs(x="") +
  # theme_classic()+
  theme(legend.position = "bottom",
        axis.title.y.left = element_text(color=gglinecol),
        axis.title.y.right = element_text(color=ggbarscol)) +
  scale_y_continuous(
    name="Attrition Rate",
    labels=scales::label_percent(),
    sec.axis = sec_axis(~.*ggcoeff, name="Monthly Registrations")) +
 scale_x_date(breaks = scales::breaks_pretty(12), labels = scales::label_date_short())


attr_90day_withRegs

Remember responses from RTSL here:

When were lists of LTFU for outreach generated? We generated the first list of LTFU for facilities to track patients in April 2021 using the 2021 Q1 NHCI Excel calculator data.
When was the dashboard deployed in both states? March 2022
When were legacy data imported? February 2022
When was the DHIS2-only entry pilot started, and was it expanded or ended? This commenced in August 2023 and ended in December 2023. The pilot just concluded and a report is being written by the consultants. This report would determine how and when we would expand this across the entire program.

Code

attr_90day_withRegs +
  annotate("rect", xmin = as.Date("2021-04-01"), 
                    xmax = as.Date("2021-09-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
  annotate("rect", xmin = as.Date("2021-12-21"), 
                    xmax = as.Date("2022-05-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
  annotate("rect", xmin = as.Date("2022-06-01"), 
                    xmax = as.Date("2023-04-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
  annotate("rect", xmin = as.Date("2023-07-15"), 
                    xmax = as.Date("2024-02-01"), 
                    ymin = 0.42, ymax = 0.47, alpha = .2) +
#   geom_text(data = data.frame(x = as.Date("2021-06-01"),
# y = 0.45, label = "1st LTFU List"), mapping = aes(x = x, y = y, label = label),
# size = 2.17, hjust = 0.45, inherit.aes = FALSE)
  annotate("text", x = as.Date("2021-06-15"), y = 0.44,
                  label = "1st LTFU Lists", size=3.3) +
    annotate("text", x = as.Date("2022-02-25"), y = 0.44,
                  label = "Legacy data", size=3.3) +
    annotate("text", x = as.Date("2022-11-01"), y = 0.44,
                  label = "Growth and Scale", size=3.5) +
    annotate("text", x = as.Date("2023-10-15"), y = 0.44,
                  label = "Naira crisis", size=3.3)

Code

my_ggsave("fig3_attritionRegs_full", 12)

Seasonal trends? Seems to be a dip in spring/summer

Code

tots_timely_rates_prep %>%
  find_dropout_rates(exp_month, exp_year) %>% 
  filter(nextdate_90_kept==0) %>% 
    mutate(exp_month=as.integer(exp_month)) %>% 
    mutate(exp_year=as.factor(exp_year)) %>% 
  ggplot()+
  geom_line(aes(x=exp_month, y=rate, 
                color=exp_year, group=exp_year, linetype=exp_year)) +
  labs(title="90-day attrition rate by year") +
  # scale_x_continuous(labels=scales::label_number(accuracy=1)) +
  scale_y_continuous(labels=scales::label_percent()) +
  scale_x_continuous(labels = scales::label_number(accuracy = 1))+
  labs(x="End month of 90-day attrition window",y="Attrition Rate") +
  # theme_classic()+
  theme(legend.position = "bottom")

By sex and state

Code

tots_timely_rates_ss %>% 
  filter(nextdate_90_kept==0) %>%
  ggplot()+
  geom_line(aes(x=exp_year_month, y=rate, color=sex)) +
  labs(title="90-day attrition rate by month") +
  scale_x_date() +
  facet_grid(~State) +
    scale_y_continuous(labels=scales::label_percent()) +
  labs(x="Year&Month of Expected Visit",y="Attrition Rate") +
  # theme_classic()+
  theme(legend.position = "bottom")

Code

# ggsave("attr_90day.jpeg", bg = "transparent", width = 20, height = 10, units = 'cm')

By state and appt visit

Code

tots_timely_rates_state_appt_tally %>% 
  filter(nextdate_90_kept==0) %>%
  ggplot()+
  geom_line(aes(x=exp_year_month, y=rate, color=appt_tally)) +
  labs(title="90-day attrition rate by month") +
  scale_x_date() +
  facet_grid(~State) +
  labs(x="Year&Month of Expected Visit",
       y="Attrition Rate", color="Latest Visit Count") +
  # theme_classic()+
    scale_y_continuous(labels=scales::label_percent()) +
  theme(legend.position = "bottom") +
  scale_y_continuous(limits = c(0, 0.75))

Code

# ggsave("attr_90day.jpeg", bg = "transparent", width = 20, height = 10, units = 'cm')

Attrition by latest visit number of patient

Code

tots_timely_rates_appt_tally %>% 
  filter(nextdate_90_kept==0) %>%
  ggplot()+
  geom_line(aes(x=exp_year_month, y=rate, group=appt_tally, color=appt_tally)) +
  scale_y_continuous(labels=scales::label_percent()) +
  labs(title="90-day attrition rate by month", y= "Attr Rate")

Code

tots_timely_rates_lga %>% 
  filter(nextdate_90_kept==0) %>%
  mutate(LGA=str_remove_all(LGA, " Local Government Area")) %>% 
  mutate(LGA=fct_reorder(LGA, rate, .fun = mean)) %>% 
  ggplot(aes(x = LGA, y = rate, color=State)) +
  geom_boxplot() +
  labs(title="90-day monthly attrition rates by LGA", 
       y= "Attrition Rate",
       caption="Removed all healthCentre-months pairs with less than 10 expected visits") +
    facet_wrap(~State, ncol=1, scales="free_y") +
    scale_y_continuous(labels=scales::label_percent()) +
  coord_flip() +
  # theme_classic() 
  theme(legend.position = "null")

Monthly HC attrition rates

Code

tots_timely_rates_hc %>% 
  filter(nextdate_90_kept==0) %>%
  mutate(LGA=str_remove_all(LGA, " Local Government Area")) %>% 
  mutate(LGA=fct_reorder(LGA, rate, .fun = mean)) %>% 
  ggplot(aes(x = LGA, y = rate, color=State)) +
  geom_boxplot() +
  labs(title="90-day monthly attrition rates at Health Centre Level",
       subtitle="Grouped by LGA and State of Health Centre",
       caption="Removed all healthCentre-months pairs with less than 10 expected visits",
       y= "Attrition Rate") +
    scale_y_continuous(labels=scales::label_percent()) +
    facet_wrap(~State, ncol=1, scales="free_y") +
  coord_flip() +
  # theme_classic() +
  theme(legend.position = "null")

Mean and SD

Code

tots_timely_rates_hc %>% 
  filter(nextdate_90_kept==0) %>%
  group_by(State) %>% 
  summarise(mean=mean(rate),
         sd=sd(rate)) %>% 
  kable() %>% kableExtra::kable_styling()

State	mean	sd
kn Kano State	0.3293533	0.2270595
og Ogun State	0.3257438	0.1767288

Repeat Dropout Patients?

Just to test, what’s the OR for experiencing a dropout for patients who have already experienced one, compared to those who have not?

Code

tots_timely %>% 
  select(reg_id, appt_tally, visits_since_dropout, nextdate_90_kept) %>% 
  mutate(nextdate_90_notMet=if_else(nextdate_90_kept==1,0,1)) %>% 
  mutate(post_dropout=if_else(visits_since_dropout>1, 1, 0)) %>% 
  janitor::tabyl(post_dropout,nextdate_90_notMet) %>% 
  janitor::adorn_title()

              nextdate_90_notMet      
 post_dropout                  0     1
            0              80519 30573
            1              29441 21805

Code

(21858*79526) / (30560*29274)

[1] 1.943051

The odds of dropout at each appointment are 1.94 fold for patients who have previously been delayed more than 90 days and later returned, compared to those who never experienced a first dropout event.

Proper Treatment Protocol?

Care should follow national guideline

Create a visit variable for each step in the care protocol, if the patient met those criteria and whether the correct meds were offered at the visit.

This could be added to the prediction model.

Code

# Should follow guideline here https://dhis2.org/nigeria-hypertension-control/

tots_timely_quality<-tots_timely %>% 
  filter(registration_date>=ymd("2021-01-01")) %>% 
    mutate(p_step1_need =
      case_when(
        appt_tally==1  & 
        ((bp_systole_mmhg >= 140 & bp_systole_mmhg < 160) | 
        (bp_diastole_mmhg >=90 & bp_diastole_mmhg < 100))
        ~ 1,
        TRUE ~ 0
      )) %>% 
    mutate(p_step2_need =
      case_when(
        appt_tally==2 & days_since_last >= 14 &
        (bp_systole_mmhg>= 140 | 
         bp_diastole_mmhg >=90)
        ~ 1,
        (appt_tally==1  & 
        ((bp_systole_mmhg >= 160 & bp_systole_mmhg < 180) | 
        (bp_diastole_mmhg >= 100 & bp_diastole_mmhg < 110)))
        ~ 1,
        TRUE ~ 0)
      ) %>% 
    mutate(p_step3_need =
      case_when(
        appt_tally==3 & days_since_last >= 14 &
        (bp_systole_mmhg>= 140 | 
         bp_diastole_mmhg >=90)
        ~ 1,
        (appt_tally==1  & 
        ((bp_systole_mmhg >= 180) | 
        (bp_diastole_mmhg >= 110)))
        ~ 1,
        TRUE ~ 0)
      ) %>% 
      mutate(p_step4_need =
      case_when(
        appt_tally==4 & days_since_last >= 14 &
        (bp_systole_mmhg>= 140 | 
         bp_diastole_mmhg >=90)
        ~ 1,
        TRUE ~ 0)
      ) %>% 
  # escalation criteria for first visit
  mutate(p_step2_need=if_else(p_step3_need==1, 0, p_step2_need)) %>% 
  mutate(p_step1_need=if_else(p_step2_need==1, 0, p_step1_need)) %>% 
  # now do the appropriate treatment for each step
    mutate(p_step1_met =
      case_when(
        p_step1_need==1  & med_amlodipine== "Amlodipine 5mg OD"
        ~ 1,
        TRUE ~ 0)
    ) %>% 
    mutate(p_step2_met =
      case_when(
        p_step2_need==1  & 
        med_amlodipine== "Amlodipine 5mg OD" &
        (med_losartan=="Losartan 50mg" | (sex=="Female" & age<45))
        ~ 1,
        TRUE ~ 0)
    ) %>% 
    mutate(p_step3_met =
      case_when(
        p_step3_need==1  & 
        med_amlodipine== "Amlodipine 10mg OD" &
        (med_losartan=="Losartan 100mg" | (sex=="Female" & age<45))
        ~ 1,
        TRUE ~ 0)
    ) %>% 
    mutate(p_step4_met =
      case_when(
        p_step4_need==1  & 
        med_amlodipine== "Amlodipine 10mg OD" & 
        med_hydrochlorothiazide=="Hydrochlorathiazide 25mg" &
        (med_losartan=="Losartan 100mg" | (sex=="Female" & age<45))
        ~ 1,
        TRUE ~ 0)
    )

r2<-function(x){round(x,3)}

tots_timely_quality %>% 
  filter(appt_tally<5) %>%
  group_by(nextdate_90_kept) %>%
  summarise(
    p_step1_rate= sum(p_step1_met)/sum(p_step1_need),
    p_step2_rate= sum(p_step2_met)/sum(p_step2_need),
    p_step3_rate= sum(p_step3_met)/sum(p_step3_need),
    p_step4_rate= sum(p_step4_met)/sum(p_step4_need)
  ) %>% 
  mutate(across(where(is.double), r2)) %>% 
  kable() %>% kableExtra::kable_styling()

nextdate_90_kept	p_step1_rate	p_step2_rate	p_step3_rate	p_step4_rate
0	0.941	0.618	0.170	0.004
1	0.957	0.615	0.136	0.002

While we’re add it, add the change of sBP and DBP since baseline.

This would be a quality indicator too.

Code

tots_timely_quality <- tots_timely_quality %>%
  mutate(bp_systole_mmhg=if_else(bp_systole_mmhg>300, 300, bp_systole_mmhg)) %>% 
  mutate(bp_diastole_mmhg=if_else(bp_diastole_mmhg>200, 200, bp_diastole_mmhg)) %>% 
  group_by(reg_id) %>% 
  mutate(sbp_baseline=if_else(appt_tally==1, bp_systole_mmhg, NA)) %>% 
  fill(sbp_baseline, .direction = "down") %>%
  mutate(dbp_baseline=if_else(appt_tally==1, bp_diastole_mmhg, NA)) %>% 
  fill(dbp_baseline, .direction = "down") %>%
  ungroup() %>% 
  mutate(sbp_change_from_baseline=bp_systole_mmhg-sbp_baseline,
         dbp_change_from_baseline=bp_diastole_mmhg-dbp_baseline)

Weather data

Now add in the daily weather data from open-meteo.com API from 1/1/2021 to 1/7/2024.

This pulled data on max temp, max apparent temp (with humidity), rain total, and rain hours for the centroid coordinates of each LGA.

For details see companion script.

Code

# import weather data and clean the lga names

weather_lga_centroids<-read_rds(here::here("data", "weather_lga_centroids.rds")) %>% 
  ungroup() %>% 
  mutate(lganame=str_squish(lganame)) %>% 
  mutate(lganame=str_remove_all(lganame, "\\/Ota"), # fix this one name
         lganame=str_replace_all(lganame, "Nassarawa","Nasarawa"), # fix this name
         lganame=str_replace_all(lganame, "Dambatta","Danbatta"), # fix this name
         lganame=str_replace_all(lganame, "Shagamu","Sagamu")) %>% # fix this name
  rename_with(~ paste0("cli_", .), max_temp_c:last_col()) %>% 
  mutate(lga_37date=str_c(lganame,date,sep="_")) %>% 
  arrange(lga_37date) %>% 
  select(lga_37date,  starts_with("cli_"))


# clean lga names in main dataset and merge with weather
tots_timely_qual_climate<-tots_timely_quality %>% 
    mutate(LGA=str_remove_all(LGA, " Local Government Area ")) %>% 
    mutate(LGA=str_sub(LGA, start=4, end=50)) %>% 
    mutate(LGA=str_squish(LGA)) %>% 
    mutate(lga_37date=str_c(LGA,as.character(nextdate_37),sep="_")) %>% 
    arrange(lga_37date) %>% 
    left_join(weather_lga_centroids, by="lga_37date")

Market data

Now add in the weekly market data from FEWSNET API for 1/1/2021 to 1/7/2024

This is official FAO data on food and staple good prices at local markets, and is used to assess food security. The prices are weekly, but have been reformed as a 10 day rolling average.

Included here are weekly subnational price data from Kano State and Lagos State for nine staples, in Naira:

Diesel
Gasoline
Millet
Rice (Milled and Broken)
Maize Grain (Yellow and White)
Sorghum (Brown and White)
Yams

According to an FAO food systems profile on Nigeria (pg 5), these are all staple foods in Nigeria.

The price of oil, both globally and domestically in Nigeria, affects the entire economy, including the price of transportation, staple foods, healthcare, and in general the cost of living. Theses prices could therefore affect ability/willingness to access HTN services.

Note data were not available for Ogun State, but Lagos borders Ogun to the South, and has market data more consistent with national rates than Oyo to the north.

For details see companion script.

Code

fews<-read_rds(here::here("data", "fews_ng_prices_2021_2024.rds")) %>% 
    ungroup() %>%
    filter(date>ymd("2021-02-01")) %>% 
    rename_with(~ paste0("mkt_", .), 3:last_col()) %>% 
    mutate(admin_1=if_else(admin_1=="Lagos", "Ogun","Kano")) %>% 
      arrange(admin_1, date) %>% 
    mutate(state_37date=str_c(admin_1,date,sep="_")) %>% 
    select(state_37date, starts_with("mkt")) %>% 
    arrange(state_37date)


tots_timely_qual_cli_mkt<-tots_timely_qual_climate %>%
    # clean up state name to merge with market prices
    arrange(State, nextdate_37) %>% 
    mutate(State=str_remove_all(State, " State"),
           State=str_sub(State, 5, 50),
           State=str_squish(State)) %>% 
    mutate(state_37date=str_c(State,nextdate_37,sep="_")) %>% 
    arrange(state_37date) %>% 
    left_join(fews, by="state_37date") %>% 
    select(reg_id, reg_visit_id, registration_date, #reorder some vars 
           event_date, appt_tally, everything())

Are weather and market prices 28-37 days after latest visit different for those who attend their next one 0-37 days after that visit, and those who fail to attend?

Ttest and effect sizes might give a general idea…

Code

pop_params_dropouts_37days_prep<-tots_timely_qual_cli_mkt %>% 
   filter(censor==0) %>% 
   filter(visits_since_dropout<1) %>% 
  # removed censored obs or those that already "droppedout"
  mutate(nextdate_37_kept=as.factor(nextdate_37_kept)) %>% 
  select(State, nextdate_37_kept, ends_with("pre10day_mean")) %>% 
  rowid_to_column() %>% 
  pivot_longer(cols=ends_with("pre10day_mean"),
               names_to="var",
               values_to="value")

# boxplot
pop_params_dropouts_37days_prep %>% 
  ggplot() +
    aes(x = nextdate_37_kept, 
        y = value) +
    geom_boxplot() +
    facet_wrap(~var, scales = "free_y") +
    
    labs(title="Next visit date kept?",
         subtitle="by price or weather parameter")

Code

# pvalues
pop_params_pvalues<-pop_params_dropouts_37days_prep %>% 
    group_by(var) %>%
    reframe(t_test = list(t.test(value ~ nextdate_37_kept, data = cur_data()))) %>%
    mutate(tidy_result = map(t_test, broom::tidy)) %>%
    unnest(tidy_result)

# effect sizes
pop_params_effect_sizes <- pop_params_dropouts_37days_prep %>%
  group_by(var) %>%
  reframe(cohens_d = cohens_d(value ~ nextdate_37_kept, data = cur_data())) %>% 
  unnest("cohens_d")

pop_params_pvalues %>% select(var, estimate, p.value)

# A tibble: 15 × 3
   var                                        estimate  p.value
   <chr>                                         <dbl>    <dbl>
 1 cli_max_heat_index_c_pre10day_mean      -0.102      7.57e- 6
 2 cli_max_temp_c_pre10day_mean            -0.422      2.80e-61
 3 cli_rain_hours_pre10day_mean             0.693      3.85e-77
 4 cli_rain_mm_pre10day_mean                0.635      4.67e-72
 5 mkt_diesel_pre10day_mean                25.6        1.28e-31
 6 mkt_gasoline_pre10day_mean              -1.18       4.26e- 1
 7 mkt_maize_grain_white_pre10day_mean     11.8        2.09e-27
 8 mkt_maize_grain_yellow_pre10day_mean    13.2        9.55e-32
 9 mkt_millet_pearl_pre10day_mean          16.2        5.99e-44
10 mkt_rice_5_percent_broken_pre10day_mean 32.5        7.39e-35
11 mkt_rice_milled_pre10day_mean           30.8        1.17e-48
12 mkt_sorghum_brown_pre10day_mean         18.1        5.87e-39
13 mkt_sorghum_white_pre10day_mean         19.0        3.87e-50
14 mkt_usd_exchange_rate_pre10day_mean     -0.00000936 3.40e- 2
15 mkt_yams_pre10day_mean                   9.80       3.21e-17

Code

pop_params_effect_sizes

# A tibble: 15 × 5
   var                                     Cohens_d    CI  CI_low  CI_high
   <chr>                                      <dbl> <dbl>   <dbl>    <dbl>
 1 cli_max_heat_index_c_pre10day_mean      -0.0353   0.95 -0.0510 -0.0195 
 2 cli_max_temp_c_pre10day_mean            -0.131    0.95 -0.147  -0.115  
 3 cli_rain_hours_pre10day_mean             0.153    0.95  0.137   0.168  
 4 cli_rain_mm_pre10day_mean                0.148    0.95  0.133   0.164  
 5 mkt_diesel_pre10day_mean                 0.0908   0.95  0.0750  0.107  
 6 mkt_gasoline_pre10day_mean              -0.00636  0.95 -0.0221  0.00939
 7 mkt_maize_grain_white_pre10day_mean      0.0893   0.95  0.0735  0.105  
 8 mkt_maize_grain_yellow_pre10day_mean     0.0961   0.95  0.0804  0.112  
 9 mkt_millet_pearl_pre10day_mean           0.115    0.95  0.0995  0.131  
10 mkt_rice_5_percent_broken_pre10day_mean  0.102    0.95  0.0858  0.117  
11 mkt_rice_milled_pre10day_mean            0.120    0.95  0.104   0.136  
12 mkt_sorghum_brown_pre10day_mean          0.109    0.95  0.0934  0.125  
13 mkt_sorghum_white_pre10day_mean          0.124    0.95  0.109   0.140  
14 mkt_usd_exchange_rate_pre10day_mean     -0.0170   0.95 -0.0328 -0.00128
15 mkt_yams_pre10day_mean                   0.0673   0.95  0.0515  0.0830

Code

# use these for the model
pop_params_top<-pop_params_effect_sizes %>% filter(abs(Cohens_d)>0.11) %>% pluck("var")
print(paste0("Consider these vars for model: ", str_c(pop_params_top, collapse =', ')))

[1] "Consider these vars for model: cli_max_temp_c_pre10day_mean, cli_rain_hours_pre10day_mean, cli_rain_mm_pre10day_mean, mkt_millet_pearl_pre10day_mean, mkt_rice_milled_pre10day_mean, mkt_sorghum_white_pre10day_mean"

Code

readr::write_lines(pop_params_top, "cli_mkt_params_for_model.txt")

A t test is performed across all these pop level parameters, to test if between the means of the parameters differ where the next visit is a success or failure (success defined as Next Visit =< 37 days from current visit).

All price and weather parameters have a small pvalue, except the price of gasoline.

However, the effect sizes (cohens d) are small for all parameters. Max temp, rain total, rain hours, millet, rice, and sorghum price, price have the highest effect sizes (>0.11). Max temp has an inverse association than expected (more attendance on comparatively hot days).

For greater effect size, may need to respecify. For example, instead of using 10 day averages, use precise weather variables on the date 28 days after latest appointment, with an outcome of visiting on that exact day.

Here is an example: for a given 10 day window, is the average price of milled rice at state level related to the rate of patients returning for subsequent visits within that 10 day window?

Code

tots_timely_rates_prep_detailed<-tots_timely_qual_cli_mkt %>% 
   filter(censor==0) %>% 
   filter(!is.na(sex)) %>% 
   filter(visits_since_dropout<2) 
#dropout occurs only once.... and if censored, not included in denom


find_dropout_rates_detailed <-function(data, ...){
  data %>% 
  mutate(nextdate_37_kept=factor(nextdate_37_kept, levels=c(0,1))) %>% 
  group_by(nextdate_37, ...) %>% 
  count(nextdate_37_kept) %>% 
  mutate(rate=n/sum(n)) %>% 
  ungroup() %>% 
  filter(nextdate_37_kept==1) 
}

tots_timely_rates_prep_detailed %>% 
  find_dropout_rates_detailed(State, mkt_rice_milled_pre10day_mean) %>% 
  filter(n>9) %>% 
  ggplot(aes(x=mkt_rice_milled_pre10day_mean, y=rate)) + 
  geom_jitter( alpha=0.5)+
  geom_smooth() +
  expand_limits(y = c(0, 1)) +
  labs(title="State Level 37-day Timely Return Rates, By market prices for milled rice",
       subtitle="excluding date ranges where the State expected less than 10 patients")

Review and Save data

In a final review, check distributions of all the vars.

Code

dfs<-tots_timely_qual_cli_mkt %>% 
  summarytools::dfSummary(plain.ascii = FALSE,stile="grid",
                          max.string.width=15)

dfs$Variable <- gsub("_", " ", dfs$Variable)

print(dfs, method="render", varnumbers   = FALSE, 
                valid.col    = FALSE,max.tbl.width = 40,
      caption="Removing _underscores_ from varnames for space")

Data Frame Summary

tots_timely_qual_cli_mkt

Dimensions: 153413 x 107
Duplicates: 0

reg id [numeric]

Mean (sd) : 16102.7 (9361.3)

min ≤ med ≤ max:

0 ≤ 16067 ≤ 32332

IQR (CV) : 16258 (0.6)

29667 distinct values

0 (0.0%)

reg visit id [character]

1. 10209_1

2. 10224_3

3. 10528_4

4. 10916_4

5. 10976_5

6. 11008_6

7. 11129_19

8. 11245_10

9. 11257_5

10. 11371_24

[ 153227 others ]

2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
2	(	0.0%	)
153393	(	100.0%	)

0 (0.0%)

registration date [Date]

min : 2021-01-01

med : 2022-05-10

max : 2024-07-01

range : 3y 6m 0d

1246 distinct values

0 (0.0%)

event date [Date]

min : 2021-01-01

med : 2023-01-23

max : 2024-07-01

range : 3y 6m 0d

1278 distinct values

0 (0.0%)

appt tally [numeric]

Mean (sd) : 6 (5.5)

min ≤ med ≤ max:

1 ≤ 4 ≤ 45

IQR (CV) : 6 (0.9)

45 distinct values

0 (0.0%)

sex [character]

1. Female

2. Male

117013	(	76.3%	)
36357	(	23.7%	)

43 (0.0%)

State [character]

1. Kano

2. Ogun

97605	(	63.6%	)
55808	(	36.4%	)

0 (0.0%)

total patient visits [integer]

Mean (sd) : 11 (7.9)

min ≤ med ≤ max:

1 ≤ 9 ≤ 45

IQR (CV) : 11 (0.7)

42 distinct values

0 (0.0%)

ncd patient status [character]

1. Active record

2. Died

3. Inactive record

4. Transferred

5. Unresponsive

151920	(	99.0%	)
452	(	0.3%	)
424	(	0.3%	)
548	(	0.4%	)
69	(	0.0%	)

0 (0.0%)

prev event date [Date]

min : 2021-01-01

med : 2022-12-22

max : 2024-06-24

range : 3y 5m 23d

1264 distinct values

29690 (19.4%)

lead event date [Date]

min : 2021-01-04

med : 2023-03-09

max : 2024-07-01

range : 3y 5m 27d

1259 distinct values

29543 (19.3%)

nextdate 37 [Date]

min : 2021-02-07

med : 2023-03-01

max : 2024-08-07

range : 3y 6m 0d

1278 distinct values

0 (0.0%)

nextdate 90 [Date]

min : 2021-04-01

med : 2023-04-23

max : 2024-09-29

range : 3y 5m 28d

1278 distinct values

0 (0.0%)

nextdate 37 kept [numeric]

Min : 0

Mean : 0.4

Max : 1

0	:	85159	(	55.5%	)
1	:	68254	(	44.5%	)

0 (0.0%)

nextdate 90 kept [numeric]

Min : 0

Mean : 0.7

Max : 1

0	:	50260	(	32.8%	)
1	:	103153	(	67.2%	)

0 (0.0%)

stored by [character]

1. Jaen_PHC

2. Gwagwarwa_PHC

3. Hotoro_NorthPHC

4. Kabuga_PHC

5. Okeilewo_FHC

6. Dawakindakata_P

7. Gano_PHC

8. Dala_PHC

9. Sheka_PHC

10. akumbabarns

[ 97 others ]

7268	(	6.9%	)
7157	(	6.8%	)
6519	(	6.2%	)
4464	(	4.2%	)
4031	(	3.8%	)
3825	(	3.6%	)
3721	(	3.5%	)
3651	(	3.4%	)
3176	(	3.0%	)
2498	(	2.4%	)
59616	(	56.3%	)

47487 (31.0%)

created by [character]

1. Akumba, Barnaba

2. DAKATA, NASSARA

3. Recorder, GWAGW

4. Recorder, JAEN

5. Recorder, HOTOR

6. Okeilewo, FHC (

7. Recorder, KABUG

8. Recorder, GANO

9. Recorder, SHEKA

10. Recorder, DALA

[ 96 others ]

15970	(	10.4%	)
8352	(	5.4%	)
8036	(	5.2%	)
7569	(	4.9%	)
6092	(	4.0%	)
5142	(	3.4%	)
4556	(	3.0%	)
3477	(	2.3%	)
3082	(	2.0%	)
2926	(	1.9%	)
88211	(	57.5%	)

0 (0.0%)

last updated by [character]

1. Recorder, JAEN

2. Recorder, HOTOR

3. Recorder, GWAGW

4. DAKATA, NASSARA

5. Recorder, KABUG

6. Okeilewo, FHC (

7. Recorder, GANO

8. Recorder, DALA

9. Recorder, SHEKA

10. Olose, PHC (Olo

[ 97 others ]

9574	(	6.2%	)
9085	(	5.9%	)
9063	(	5.9%	)
8352	(	5.4%	)
5544	(	3.6%	)
5496	(	3.6%	)
4377	(	2.9%	)
4030	(	2.6%	)
3429	(	2.2%	)
3209	(	2.1%	)
91254	(	59.5%	)

0 (0.0%)

last updated on [Date]

min : 2021-01-02

med : 2023-06-10

max : 2024-07-21

range : 3y 6m 19d

1265 distinct values

0 (0.0%)

scheduled date [Date]

min : 1974-06-12

med : 2023-03-31

max : 2201-08-29

range : 227y 2m 17d

1331 distinct values

0 (0.0%)

incident date [Date]

min : 2003-12-08

med : 2022-06-14

max : 2024-07-17

range : 20y 7m 9d

1233 distinct values

0 (0.0%)

organisation unit name [character]

1. kn Ja'En Primar

2. kn Hotoro Arewa

3. kn Gwagwarwa Pr

4. kn Dawakin Daka

5. kn Kabuga Prima

6. og Oke Ilewo Fa

7. kn Gano Primary

8. kn Dala Primary

9. kn Shekar Barde

10. og Olose Primar

[ 94 others ]

9816	(	6.4%	)
9691	(	6.3%	)
9293	(	6.1%	)
8352	(	5.4%	)
5736	(	3.7%	)
5519	(	3.6%	)
4539	(	3.0%	)
4193	(	2.7%	)
3748	(	2.4%	)
3271	(	2.1%	)
89255	(	58.2%	)

0 (0.0%)

Nat [character]

1. fg Nigeria

153413

(

100.0%

)

0 (0.0%)

LGA [character]

1. Nasarawa

2. Gwale

3. Dala

4. Abeokuta South

5. Kano Municipal

6. Kumbotso

7. Ifo

8. Abeokuta North

9. Dawakin Kudu

10. Tsanyawa

[ 32 others ]

27813	(	18.1%	)
15552	(	10.1%	)
9870	(	6.4%	)
8028	(	5.2%	)
5888	(	3.8%	)
5852	(	3.8%	)
5312	(	3.5%	)
4978	(	3.2%	)
4927	(	3.2%	)
4809	(	3.1%	)
60384	(	39.4%	)

0 (0.0%)

Ward [character]

kn Dorayi Ward

kn Hotoro Nort

kn Gawuna Ward

kn Dakata Ward

kn Kabuga Ward

og Oke-Ilewo W

kn Gano Ward

kn Kabuwaya Wa

kn Dan Maliki

10. Ota Local Gover

[ 90 others ]

9816	(	6.4%	)
9691	(	6.3%	)
9293	(	6.1%	)
8352	(	5.4%	)
5736	(	3.7%	)
5519	(	3.6%	)
4539	(	3.0%	)
4193	(	2.7%	)
3749	(	2.4%	)
3576	(	2.3%	)
88949	(	58.0%	)

0 (0.0%)

HC [character]

kn Ja'En Prima

kn Hotoro Arew

kn Gwagwarwa P

kn Dawakin Dak

kn Kabuga Prim

og Oke Ilewo F

kn Gano Primar

kn Dala Primar

kn Shekar Bard

10.

og Olose Prima

[ 95 others ]

9816	(	6.4%	)
9691	(	6.3%	)
9293	(	6.1%	)
8352	(	5.4%	)
5736	(	3.7%	)
5519	(	3.6%	)
4539	(	3.0%	)
4193	(	2.7%	)
3748	(	2.4%	)
3271	(	2.1%	)
89255	(	58.2%	)

0 (0.0%)

FACILITY [character]

og Agbara Prim

og Ajuwon Prim

og Alagbon Hea

og Igbogila Pr

og Keesi Prima

og Mobalufon M

og Oke-Oyinbo

og Otta Primar

og Owode Healt

10.

og Sango Prima

213	(	2.5%	)
398	(	4.6%	)
651	(	7.6%	)
1019	(	11.8%	)
1209	(	14.1%	)
657	(	7.6%	)
898	(	10.4%	)
1812	(	21.1%	)
193	(	2.2%	)
1551	(	18.0%	)

144812 (94.4%)

program status [character]

1. ACTIVE

2. CANCELLED

3. COMPLETED

152913	(	99.7%	)
450	(	0.3%	)
50	(	0.0%	)

0 (0.0%)

event status [character]

1. ACTIVE

2. COMPLETED

3659	(	2.4%	)
149754	(	97.6%	)

0 (0.0%)

organisation unit [character]

1. ehJsDey0nF2

2. DxfVRX9j8np

3. xUORgwU22YU

4. jpPaQTbjIlU

5. aoOPscxu7Tc

6. S84LuOwGZ6R

7. bYamdI7FHE5

8. JJL8Ukh1ZlZ

9. uRzXjHHigm9

10. PKf7VHpxmBG

[ 95 others ]

9816	(	6.4%	)
9691	(	6.3%	)
9293	(	6.1%	)
8352	(	5.4%	)
5736	(	3.7%	)
5519	(	3.6%	)
4539	(	3.0%	)
4193	(	2.7%	)
3748	(	2.4%	)
3271	(	2.1%	)
89255	(	58.2%	)

0 (0.0%)

height cm [numeric]

Mean (sd) : 86.4 (79.3)

min ≤ med ≤ max:

0.4 ≤ 148 ≤ 758

IQR (CV) : 158.4 (0.9)

156 distinct values

146189 (95.3%)

med losartan [character]

1. Losartan 100mg

2. Losartan 50mg

8819	(	15.9%	)
46480	(	84.1%	)

98114 (64.0%)

hypertension treated in the past [character]

1. First time

2. Was treated in

127501	(	83.7%	)
24776	(	16.3%	)

1136 (0.7%)

bp diastole mmhg [numeric]

Mean (sd) : 84 (12.9)

min ≤ med ≤ max:

1 ≤ 82 ≤ 200

IQR (CV) : 14 (0.2)

181 distinct values

1066 (0.7%)

med hydrochlorothiazide [character]

1. Hydrochlorathia

2. Hydrochlorathia

156	(	11.5%	)
1201	(	88.5%	)

152056 (99.1%)

other hypertension medication [character]

1. N

2. none

3. Lisinopril

4. nifedipine Vaso

5. no

6. non

7. elanapril

8. other drug

9. nil

10. vasoprin

[ 111 others ]

2144	(	81.8%	)
177	(	6.8%	)
47	(	1.8%	)
18	(	0.7%	)
17	(	0.6%	)
16	(	0.6%	)
12	(	0.5%	)
11	(	0.4%	)
10	(	0.4%	)
8	(	0.3%	)
160	(	6.1%	)

150793 (98.3%)

weight kg [numeric]

Mean (sd) : 73.6 (71.8)

min ≤ med ≤ max:

28 ≤ 70 ≤ 6298

IQR (CV) : 21 (1)

116 distinct values

145524 (94.9%)

patient followup status [character]

1. Called

2. Overdue Pending

3. Upcoming visit

766	(	0.5%	)
113357	(	79.7%	)
28175	(	19.8%	)

11115 (7.2%)

bmi [numeric]

Mean (sd) : 137463.3 (281054.3)

min ≤ med ≤ max:

0 ≤ 38.4 ≤ 4296296

IQR (CV) : 262195.2 (2)

1478 distinct values

146184 (95.3%)

bmi measurement [numeric]

1 distinct value

8591

(

100.0%

)

144822 (94.4%)

days of medication [character]

1. 30 days

2. 60 days

3. 90 days

25742	(	97.0%	)
659	(	2.5%	)
139	(	0.5%	)

126873 (82.7%)

consent to send sms reminders [numeric]

1 distinct value

25896

(

100.0%

)

127517 (83.1%)

does patient have diabetes [character]

1. No

2. Yes

139830	(	97.8%	)
3078	(	2.2%	)

10505 (6.8%)

blood sugar measurement [numeric]

1 distinct value

501

(

100.0%

)

152912 (99.7%)

relationship to patient [character]

1. Daughter

2. Father

3. Friend

4. Husband

5. Mother

6. Neighbor

7. Other

8. Son

9. Wife

21134	(	17.2%	)
5239	(	4.3%	)
1136	(	0.9%	)
27941	(	22.8%	)
642	(	0.5%	)
2547	(	2.1%	)
10205	(	8.3%	)
44818	(	36.6%	)
8902	(	7.3%	)

30849 (20.1%)

bp systole mmhg [numeric]

Mean (sd) : 141.2 (19.7)

min ≤ med ≤ max:

1 ≤ 140 ≤ 300

IQR (CV) : 23 (0.1)

224 distinct values

530 (0.3%)

other diabetes medication [logical]

All NA's

153413 (100.0%)

newage [character]

1. 50
2. 60
3. 70
4. 40
5. 55
6. 45
7. 65
8. 80
9. 52
10. 35
[ 86 others ]

12712	(	8.3%	)
11770	(	7.7%	)
8481	(	5.5%	)
7095	(	4.6%	)
6477	(	4.2%	)
6370	(	4.2%	)
5382	(	3.5%	)
3593	(	2.3%	)
3453	(	2.3%	)
3342	(	2.2%	)
84738	(	55.2%	)

0 (0.0%)

med amlodipine [character]

1. Amlodipine 10mg

2. Amlodipine 5mg

3. Other drug

13092	(	8.8%	)
136353	(	91.2%	)
15	(	0.0%	)

3953 (2.6%)

med telmisartan [character]

1. Telmisartan(40m

2. Telmisartan(80m

310	(	72.9%	)
115	(	27.1%	)

152988 (99.7%)

does patient have hypertension [character]

1. Yes

153413

(

100.0%

)

0 (0.0%)

consent to record data [numeric]

1 distinct value

153413

(

100.0%

)

0 (0.0%)

has supporter [numeric]

Min : 0

Mean : 0.8

Max : 1

0	:	29413	(	19.2%	)
1	:	124000	(	80.8%	)

0 (0.0%)

has phone [numeric]

Min : 0

Mean : 0.8

Max : 1

0	:	28289	(	18.4%	)
1	:	125124	(	81.6%	)

0 (0.0%)

age [integer]

Mean (sd) : 54 (14.1)

min ≤ med ≤ max:

18 ≤ 53 ≤ 124

IQR (CV) : 18 (0.3)

96 distinct values

0 (0.0%)

days since entry [numeric]

Mean (sd) : 245.1 (248.8)

min ≤ med ≤ max:

0 ≤ 170 ≤ 1272

IQR (CV) : 353 (1)

1255 distinct values

0 (0.0%)

exp year [numeric]

Mean (sd) : 2022.8 (0.9)

min ≤ med ≤ max:

2021 ≤ 2023 ≤ 2024

IQR (CV) : 1 (0)

2021	:	12579	(	8.2%	)
2022	:	43654	(	28.5%	)
2023	:	61089	(	39.8%	)
2024	:	36091	(	23.5%	)

0 (0.0%)

exp month [numeric]

Mean (sd) : 6.6 (3.3)

min ≤ med ≤ max:

1 ≤ 7 ≤ 12

IQR (CV) : 5 (0.5)

12 distinct values

0 (0.0%)

exp year month [Date]

min : 2021-04-01

med : 2023-04-01

max : 2024-09-01

range : 3y 5m 0d

42 distinct values

0 (0.0%)

days since last [integer]

Mean (sd) : 66.9 (87.4)

min ≤ med ≤ max:

7 ≤ 35 ≤ 1209

IQR (CV) : 36 (1.3)

892 distinct values

29690 (19.4%)

nextdate 90 dropout [numeric]

Min : 0

Mean : 0.5

Max : 1

0	:	76188	(	49.7%	)
1	:	77225	(	50.3%	)

0 (0.0%)

visits since dropout [numeric]

Mean (sd) : 1.8 (3.2)

min ≤ med ≤ max:

0 ≤ 1 ≤ 35

IQR (CV) : 2 (1.8)

36 distinct values

0 (0.0%)

censor [numeric]

Min : 0

Mean : 0.1

Max : 1

0	:	141106	(	92.0%	)
1	:	12307	(	8.0%	)

0 (0.0%)

p step1 need [numeric]

Min : 0

Mean : 0.1

Max : 1

0	:	140951	(	91.9%	)
1	:	12462	(	8.1%	)

0 (0.0%)

p step2 need [numeric]

Min : 0

Mean : 0.2

Max : 1

0	:	127335	(	83.0%	)
1	:	26078	(	17.0%	)

0 (0.0%)

p step3 need [numeric]

Min : 0

Mean : 0.1

Max : 1

0	:	139241	(	90.8%	)
1	:	14172	(	9.2%	)

0 (0.0%)

p step4 need [numeric]

Min : 0

Mean : 0

Max : 1

0	:	146882	(	95.7%	)
1	:	6531	(	4.3%	)

0 (0.0%)

p step1 met [numeric]

Min : 0

Mean : 0.1

Max : 1

0	:	141580	(	92.3%	)
1	:	11833	(	7.7%	)

0 (0.0%)

p step2 met [numeric]

Min : 0

Mean : 0.1

Max : 1

0	:	137348	(	89.5%	)
1	:	16065	(	10.5%	)

0 (0.0%)

p step3 met [numeric]

Min : 0

Mean : 0

Max : 1

0	:	151287	(	98.6%	)
1	:	2126	(	1.4%	)

0 (0.0%)

p step4 met [numeric]

Min : 0

Mean : 0

Max : 1

0	:	153395	(	100.0%	)
1	:	18	(	0.0%	)

0 (0.0%)

sbp baseline [numeric]

Mean (sd) : 157.6 (17.2)

min ≤ med ≤ max:

14 ≤ 158 ≤ 300

IQR (CV) : 23 (0.1)

165 distinct values

514 (0.3%)

dbp baseline [numeric]

Mean (sd) : 92.4 (13.3)

min ≤ med ≤ max:

1 ≤ 91 ≤ 200

IQR (CV) : 15 (0.1)

153 distinct values

720 (0.5%)

sbp change from baseline [numeric]

Mean (sd) : -16.4 (20.6)

min ≤ med ≤ max:

-195 ≤ -14 ≤ 178

IQR (CV) : 30 (-1.3)

302 distinct values

917 (0.6%)

dbp change from baseline [numeric]

Mean (sd) : -8.4 (14.1)

min ≤ med ≤ max:

-143 ≤ -6 ≤ 130

IQR (CV) : 17 (-1.7)

241 distinct values

1592 (1.0%)

lga 37date [character]

1. Nasarawa_2024-0

2. Nasarawa_2023-0

3. Nasarawa_2023-0

4. Nasarawa_2023-0

5. Nasarawa_2022-1

6. Nasarawa_2024-0

7. Nasarawa_2023-0

8. Nasarawa_2024-0

9. Nasarawa_2023-0

10. Nasarawa_2022-0

[ 27800 others ]

140	(	0.1%	)
129	(	0.1%	)
96	(	0.1%	)
93	(	0.1%	)
83	(	0.1%	)
83	(	0.1%	)
82	(	0.1%	)
80	(	0.1%	)
79	(	0.1%	)
77	(	0.1%	)
152471	(	99.4%	)

0 (0.0%)

cli max temp c [numeric]

Mean (sd) : 30.4 (3.4)

min ≤ med ≤ max:

18.5 ≤ 29.6 ≤ 41.6

IQR (CV) : 4.6 (0.1)

217 distinct values

4659 (3.0%)

cli max heat index c [numeric]

Mean (sd) : 33.7 (3.3)

min ≤ med ≤ max:

19 ≤ 33.7 ≤ 44

IQR (CV) : 4.6 (0.1)

232 distinct values

4659 (3.0%)

cli rain mm [numeric]

Mean (sd) : 4.1 (7.4)

min ≤ med ≤ max:

0 ≤ 0.6 ≤ 167.6

IQR (CV) : 5.4 (1.8)

523 distinct values

4659 (3.0%)

cli rain hours [numeric]

Mean (sd) : 4.9 (6)

min ≤ med ≤ max:

0 ≤ 2 ≤ 24

IQR (CV) : 8 (1.2)

25 distinct values

4659 (3.0%)

cli max temp c pre10day mean [numeric]

Mean (sd) : 30.4 (3.2)

min ≤ med ≤ max:

21 ≤ 29.6 ≤ 40

IQR (CV) : 4.4 (0.1)

1582 distinct values

4659 (3.0%)

cli max heat index c pre10day mean [numeric]

Mean (sd) : 33.7 (3)

min ≤ med ≤ max:

22.5 ≤ 33.7 ≤ 42.4

IQR (CV) : 3.8 (0.1)

1687 distinct values

4659 (3.0%)

cli rain mm pre10day mean [numeric]

Mean (sd) : 4.1 (4.4)

min ≤ med ≤ max:

0 ≤ 2.9 ≤ 42.9

IQR (CV) : 6.3 (1.1)

2607 distinct values

4659 (3.0%)

cli rain hours pre10day mean [numeric]

Mean (sd) : 4.8 (4.6)

min ≤ med ≤ max:

0 ≤ 3.7 ≤ 20.6

IQR (CV) : 7 (1)

202 distinct values

4659 (3.0%)

state 37date [character]

1. Kano_2023-06-21

2. Kano_2023-07-06

3. Kano_2023-06-28

4. Kano_2023-01-18

5. Kano_2022-07-27

6. Kano_2023-01-11

7. Kano_2023-02-03

8. Kano_2023-08-09

9. Kano_2023-07-26

10. Kano_2022-10-19

[ 2533 others ]

326	(	0.2%	)
321	(	0.2%	)
303	(	0.2%	)
289	(	0.2%	)
253	(	0.2%	)
236	(	0.2%	)
236	(	0.2%	)
236	(	0.2%	)
234	(	0.2%	)
231	(	0.2%	)
150748	(	98.3%	)

0 (0.0%)

mkt diesel [numeric]

Mean (sd) : 752.7 (287.3)

min ≤ med ≤ max:

188 ≤ 744 ≤ 1470

IQR (CV) : 218 (0.4)

169 distinct values

4659 (3.0%)

mkt gasoline [numeric]

Mean (sd) : 369.5 (219.1)

min ≤ med ≤ max:

161 ≤ 274 ≤ 1070

IQR (CV) : 405.2 (0.6)

123 distinct values

4659 (3.0%)

mkt maize grain white [numeric]

Mean (sd) : 346.2 (167.4)

min ≤ med ≤ max:

151.5 ≤ 282 ≤ 814

IQR (CV) : 194.4 (0.5)

175 distinct values

4659 (3.0%)

mkt maize grain yellow [numeric]

Mean (sd) : 371.3 (169.7)

min ≤ med ≤ max:

159.2 ≤ 322.3 ≤ 851.4

IQR (CV) : 211.3 (0.5)

189 distinct values

4659 (3.0%)

mkt millet pearl [numeric]

Mean (sd) : 370 (167.6)

min ≤ med ≤ max:

160.3 ≤ 316.1 ≤ 847.9

IQR (CV) : 146.8 (0.5)

210 distinct values

4659 (3.0%)

mkt rice 5 percent broken [numeric]

Mean (sd) : 895.6 (395.7)

min ≤ med ≤ max:

475.8 ≤ 724.1 ≤ 2160.5

IQR (CV) : 466.1 (0.4)

201 distinct values

4659 (3.0%)

mkt rice milled [numeric]

Mean (sd) : 719.5 (320)

min ≤ med ≤ max:

278 ≤ 607.1 ≤ 1641.3

IQR (CV) : 398.9 (0.4)

179 distinct values

4659 (3.0%)

mkt sorghum brown [numeric]

Mean (sd) : 378.2 (193.1)

min ≤ med ≤ max:

155.9 ≤ 319.6 ≤ 991.5

IQR (CV) : 223.4 (0.5)

183 distinct values

4659 (3.0%)

mkt sorghum white [numeric]

Mean (sd) : 365.7 (178.4)

min ≤ med ≤ max:

154.7 ≤ 332.4 ≤ 852.3

IQR (CV) : 191 (0.5)

159 distinct values

4659 (3.0%)

mkt yams [numeric]

Mean (sd) : 338 (184.8)

min ≤ med ≤ max:

136.9 ≤ 267.8 ≤ 836.7

IQR (CV) : 204.9 (0.5)

234 distinct values

4659 (3.0%)

mkt usd exchange rate [numeric]

Mean (sd) : 0 (0)

min ≤ med ≤ max:

0 ≤ 0 ≤ 0

IQR (CV) : 0 (0.3)

36 distinct values

4659 (3.0%)

mkt diesel pre10day mean [numeric]

Mean (sd) : 749.2 (287)

min ≤ med ≤ max:

188 ≤ 744 ≤ 1458

IQR (CV) : 210 (0.4)

1254 distinct values

4659 (3.0%)

mkt gasoline pre10day mean [numeric]

Mean (sd) : 366.2 (216.7)

min ≤ med ≤ max:

160.8 ≤ 274 ≤ 1070

IQR (CV) : 400.8 (0.6)

929 distinct values

4659 (3.0%)

mkt maize grain white pre10day mean [numeric]

Mean (sd) : 344.3 (165.2)

min ≤ med ≤ max:

153.8 ≤ 281.5 ≤ 814

IQR (CV) : 189.2 (0.5)

1534 distinct values

4659 (3.0%)

mkt maize grain yellow pre10day mean [numeric]

Mean (sd) : 369.4 (168)

min ≤ med ≤ max:

162.1 ≤ 322.3 ≤ 851.4

IQR (CV) : 208.2 (0.5)

1672 distinct values

4659 (3.0%)

mkt millet pearl pre10day mean [numeric]

Mean (sd) : 368.5 (166.4)

min ≤ med ≤ max:

161.1 ≤ 316.1 ≤ 846.8

IQR (CV) : 150 (0.5)

1766 distinct values

4659 (3.0%)

mkt rice 5 percent broken pre10day mean [numeric]

Mean (sd) : 891.6 (392.4)

min ≤ med ≤ max:

479 ≤ 727.5 ≤ 2157.1

IQR (CV) : 468.8 (0.4)

1713 distinct values

4659 (3.0%)

mkt rice milled pre10day mean [numeric]

Mean (sd) : 714.5 (313.8)

min ≤ med ≤ max:

281.6 ≤ 610.5 ≤ 1636.4

IQR (CV) : 408.9 (0.4)

1578 distinct values

4659 (3.0%)

mkt sorghum brown pre10day mean [numeric]

Mean (sd) : 376.2 (191.5)

min ≤ med ≤ max:

159.1 ≤ 319.6 ≤ 989

IQR (CV) : 221.4 (0.5)

1599 distinct values

4659 (3.0%)

mkt sorghum white pre10day mean [numeric]

Mean (sd) : 363.9 (177.2)

min ≤ med ≤ max:

154.7 ≤ 332 ≤ 852.3

IQR (CV) : 193.6 (0.5)

1411 distinct values

4659 (3.0%)

mkt yams pre10day mean [numeric]

Mean (sd) : 335.4 (182)

min ≤ med ≤ max:

139.1 ≤ 269.2 ≤ 836.7

IQR (CV) : 211.6 (0.5)

1921 distinct values

4659 (3.0%)

mkt usd exchange rate pre10day mean [numeric]

Mean (sd) : 0 (0)

min ≤ med ≤ max:

0 ≤ 0 ≤ 0

IQR (CV) : 0 (0.3)

334 distinct values

4659 (3.0%)

Generated by summarytools 1.0.1 (R version 4.3.3)
2024-08-27

Code

# dhis_subset %>% skimr::skim()

# dhis_clean %>% 
#   select(newage, reg_id) %>% 
#   distinct() %>% 
#   mutate(newage=as.numeric(newage)) %>% 
#   skimr::skim()

Code

readr::write_rds(tots_timely_qual_cli_mkt, 
                 here::here("data","tots_timely_qual_cli_mkt.rds"))

sessionInfo()

R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=en_US.UTF-8  LC_CTYPE=en_US.UTF-8    LC_MONETARY=en_US.UTF-8
[4] LC_NUMERIC=C            LC_TIME=en_US.UTF-8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.1         DT_0.33            RColorBrewer_1.1-3 effectsize_0.8.7  
 [5] kableExtra_1.4.0   summarytools_1.0.1 patchwork_1.2.0    skimr_2.1.5       
 [9] slider_0.3.1       table1_1.4.3       lubridate_1.9.3    forcats_1.0.0     
[13] stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5       
[17] tidyr_1.3.1        tibble_3.2.1       ggplot2_3.5.1      tidyverse_2.0.0   

loaded via a namespace (and not attached):
 [1] tcltk_4.3.3       sandwich_3.1-0    rlang_1.1.4       magrittr_2.0.3   
 [5] multcomp_1.4-25   snakecase_0.11.1  matrixStats_1.2.0 compiler_4.3.3   
 [9] mgcv_1.9-1        systemfonts_1.1.0 vctrs_0.6.5       reshape2_1.4.4   
[13] pkgconfig_2.0.3   crayon_1.5.2      fastmap_1.2.0     backports_1.4.1  
[17] magick_2.8.3      labeling_0.4.3    pander_0.6.5      utf8_1.2.4       
[21] rmarkdown_2.26    tzdb_0.4.0        ragg_1.3.0        bit_4.0.5        
[25] xfun_0.43         cachem_1.1.0      jsonlite_1.8.8    highr_0.10       
[29] pryr_0.1.6        broom_1.0.5       parallel_4.3.3    R6_2.5.1         
[33] bslib_0.7.0       stringi_1.8.4     jquerylib_0.1.4   estimability_1.5 
[37] Rcpp_1.0.12       knitr_1.46        warp_0.2.1        zoo_1.8-12       
[41] base64enc_0.1-3   parameters_0.21.6 Matrix_1.6-5      splines_4.3.3    
[45] timechange_0.3.0  tidyselect_1.2.1  rstudioapi_0.16.0 yaml_2.3.8       
[49] codetools_0.2-19  lattice_0.22-6    plyr_1.8.9        withr_3.0.0      
[53] bayestestR_0.13.2 coda_0.19-4.1     evaluate_0.23     survival_3.5-8   
[57] xml2_1.3.6        pillar_1.9.0      checkmate_2.3.1   insight_0.19.10  
[61] generics_0.1.3    vroom_1.6.5       rprojroot_2.0.4   hms_1.1.3        
[65] munsell_0.5.1     scales_1.3.0      xtable_1.8-4      glue_1.7.0       
[69] janitor_2.2.0     emmeans_1.10.1    tools_4.3.3       mvtnorm_1.2-4    
[73] rapportools_1.1   grid_4.3.3        crosstalk_1.2.1   datawizard_0.10.0
[77] colorspace_2.1-0  nlme_3.1-164      repr_1.1.7        Formula_1.2-5    
[81] cli_3.6.2         textshaping_0.3.7 fansi_1.0.6       viridisLite_0.4.2
[85] svglite_2.1.3     gtable_0.3.5      sass_0.4.9        digest_0.6.35    
[89] TH.data_1.1-2     htmlwidgets_1.6.4 farver_2.1.2      htmltools_0.5.8.1
[93] lifecycle_1.0.4   bit64_4.0.5       MASS_7.3-60.0.1