dump_upload_pipeline.py — Mobile-App Ingestion with QR Dedup
BCHPR · field operations · 2024 – present
4,005-line reusable mobile-app data-dump ingestion pipeline that prevents duplicate enrolment and silent sync collisions via QR-based identity, SQLite audit trail, and configurable change policies.
Highlights
- QR-based identity resolution (record_id ← QR code) — immutable participant matching across dumps.
- FormConfig change policies: skip / flag / overwrite · within-dump policies: skip / flag / most_complete / most_recent.
- SQLite audit trail on every record: _source_file, _source_folder, _source_created_at, _source_modified_at.
- Content-hash idempotency (SHA-256 of serialised record) for duplicate detection.
- Chunked REDCap import (configurable chunk_size, default 100) with drop_fields for sensitive data filtering.
Related projects
Architect
my_functions.py — Centralised Python Library
The 21,086-line shared Python library that every BCHPR data project depends on — APIManager, PathsManager, REDCap wrappers, study-ID generation, SharePoint I/O, and dozens of cross-project utilities.
Architect
data_quality_manager.py — Enterprise DQA Framework
11,007-line data quality platform with fluent QueryBuilder, persistent query lifecycle tracking, duplicate analysis, and double-data-entry verification across 28+ instruments — with SQLite persistence and Polars acceleration.
Engineer
study_id_patterns.py — Study-ID Regex Registry
2,611-line centralised registry of 8 study-ID patterns and 14 site-code patterns across Cameroon, Nigeria, and Vietnam projects — with vectorised extraction, validation, classification, and cleaning.