One of the unique aspects of the CHAVI databank is an unified, relational, broad, common database which is used to store de-identified demographic, clinical, pathological, molecular biological, treatment and outcome data for all patients. This database can be queried flexibly and therefore is a powerful tool for exploring image sets across cancer sites. This feature allows an interested researcher to potentially interrogate radiological images in a cancer site agonistic fashion. For example the researcher may wish to evaluate the quantitative imaging characteristics of a specific histology or tumors in a specific age group or explore the differences across age groups. As information on the imaging data available is linked to this database this feature allows the user to get datasets that are extremely varied and flexible. 


Protection of confidentiality while retaining maximum information is one of the forte of this system. Not only does the use of this system ensures that de-identified DICOM data is linked to de-identified patient information but also ensures that the longitudinal temporal integrity between the imaging data and the clinical data is maintained. Therefore longitudinal imaging datasets can be queried in relation to the clinical treatment and response that may evolve over a period of time. 

To read more about the database design and architecture please see this paper:

Kundu S, Chakraborty S, Mukhopadhyay J, Das S, Chatterjee S, Basu Achari R, et al. Research Goal-Driven Data Model and Harmonization for De-Identifying Patient Data in Radiomics. J Digit Imaging [Internet]. 2021 Jul 9; Available from: https://doi.org/10.1007/s10278-021-00476-9

The following image shows the architecture of the CHAVI database.

CHAVI Database Architecture
Figure 1: Simplified version of the CHAVI Database design. 

The data dictionary of CHAVI is available on the dedicated page for the same. Please click here to access.