In this post we’ll describe how we’ve setup Atlas to run within a Google Cloud environment, using BigQuery as it’s source database.
Why Atlas?
But why setup Atlas anyway? Atlas is a very neat tool to conduct efficient analyses on standardized observational data converted to the OMOP Common Data Model (CDM). By converting healthcare data to a standardized format, researchers can easily apply their analyses to multiple data sources. Atlas facilitates these analyses and can be used out of the box to gain quick insights into OMOP CDM data. Also, since it is open source software, it is freely available and has a community of users and developers to help you on your way. Atlas mainly evolves around concept sets and cohorts which researches can define based on the standardized vocabularies. Using these concept sets and cohorts, you can create groups of patients based on for example an exposure to a drug or diagnosis of a condition.
Although this might sound very easy, it takes quite some steps to get Atlas up and running. We chose to setup Atlas in our Google Cloud Platform (GCP) environment. Of course you can use any other cloud platform, or setup Atlas on premises. However, by showing it is possible to setup Atlas on a cloud platform such as GCP, we hope to prove you don’t need a large organization with existing IT infrastructure to start with data standardization in healthcare.
In this blog we’ll look at:
- Database setup (BigQuery)
- Creating a virtual machine to host atlas
- Network and firewall rules
- Using Docker and Docker Compose
- Installing database drivers
- Refreshing your OMOP CDM data
- Debugging
Prerequisites
Since we’re running this project on Google Cloud, a Google Cloud project would come in handy. Also, this post assumes you already have data available in the OMOP CDM format.
Prerequisites:
- Google Cloud project
- BigQuery dataset containing OMOP data
Creating the BigQuery tables
To run Atlas, you need to have a database containing data formatted according to the OMOP Common Data Model. Usually you have an ETL-process in place to do just that. We’ve for instance designed an ETL in R to convert GemsTracker data from a MySQL database into OMOP CDM.
Once you have an ETL in place, you need a database to store the resulting data. Since we’re running this project in Google Cloud, we’ve opted for BigQuery. Luckily OHDSI provides us with the (almost) complete queries to initiate the necessary BigQuery tables. First, create a new BigQuery dataset dedicated to this data, and then run the BigQuery DDL. Simply replace @cdmDatabaseSchema
with the name of the dataset you created, and you’re good to go. For now I’ll call this dataset cdm
, but you can pick anything you like. When we created our tables, there were still some tiny bugs in the DDL code, which you’ll spot soon enough. Also, you can simply use the version we prepared.
Lastly, we’ll also create two empty datasets for temporary data and results data, we’ll call these temp
and results
. These datasets are needed later in the process.
Creating a VM to host Atlas
Next, we’ll need a virtual machine to host Atlas. For this we’ll use Google Compute Engine. To make sure our VM is secure, we’ll use a private VPC and a VM without a public IP.
VPC network
First we need to create a private VPC-network. When creating a VPC-network, Google will also ask you to create a new subnet. You can do this via the Google interface, or run it via cli, using the code below. Of course, you can modify the names and region.
gcloud compute networks create new-vpc-network --project=new-project --subnet-mode=custom --mtu=1460 --bgp-routing-mode=regional && gcloud compute networks subnets create new-subnet --project=new-project --range=10.0.0.0/24 --stack-type=IPV4_ONLY --network=new-vpc-network --region=europe-west4
Firewall rules
Next, we’re going to create a new firewall rule. This firewall will allow us to use Identity-Aware Proxy (IAP) TCP forwarding to connect to the VM without the need of a public IP. You can check out the Google documentation for some more information about IAP.
We’ll create a firewall rule to allow ingress from IAP via port 22 (SSH). We’re going to limit this access to IP range 35.235.240.0/20
, which corresponds to the IP’s Google uses internally. Also, we’ll attach a network tag allow-ingress-from-iap
to the firewall rule. This enables us to apply this rule only to VM’s containing the same network tag.
gcloud compute --project=new-project firewall-rules create allow-ingress-from-iap --direction=INGRESS --priority=1000 --network=new-vpc-network --action=ALLOW --rules=tcp:22 --source-ranges=35.235.240.0/20 --target-tags=allow-ingress-from-iap
Compute instance
This is where we’ll create the VM to host Atlas. You can pick any machine you like, but we’re going to use a basic Ubuntu 22 image. Make sure your boot disk isn’t too small, and don’t forget to disable the public IP.
gcloud compute instances create atlas-instance --project=new-project --zone=europe-west4-a --machine-type=e2-medium --network-interface=stack-type=IPV4_ONLY,subnet=new-subnet,no-address --maintenance-policy=MIGRATE --provisioning-model=STANDARD --service-account=xyz-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --tags=allow-ingress-from-iap,http-server,https-server --create-disk=auto-delete=yes,boot=yes,device-name=atlas-instance,image=projects/ubuntu-os-cloud/global/images/ubuntu-2204-jammy-v20230429,mode=rw,size=50,type=projects/new-project/zones/europe-west4-a/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=ec-src=vm_add-gcloud --reservation-affinity=any
Configuring the VM
Installing Docker
Now we have our VM up and running, it’s time to prepare it to run Atlas. Since we’re using Broadsea, the first thing we need is Docker and Docker Compose. Luckily, DigitalOcean already prepared a perfect tutorial of how to install Docker on Ubuntu. These next steps are taken from their guide.
sudo apt update
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
apt-cache policy docker-ce
sudo apt install docker-ce
sudo systemctl status docker
sudo apt install docker-compose
Clone Broadsea repository
To run Broadsea, we need to get the Broadsea repository to our VM. Make sure your Git account is authenticated and clone the Broadsea repo. For this tutorial, we’re going to use the most recent version (v3.0.0).
git clone git@github.com:OHDSI/Broadsea.git --branch v3.0.0
git clone git@github.com:OHDSI/WebAPI.git --branch v2.13.0
Service account
Also, we need a Google service account with the correct rights to run BigQuery jobs. Create a new service account using Google IAM. Grant this service account at least the BigQuery User
and BigQuery Job User
roles, and download the service account key. We need to upload this service account key, which is a json file, to the VM. If you SSH into the VM via the Google Compute Engine page, Google provides you with a nice upload button, which makes uploading file very easy.
BigQuery drivers
Broadsea comes with PostgreSQL drivers included. However, to use BigQuery, we need to download the appropriate drivers first, and save them in the Broadsea folder.
sudo apt-get install unzip
sudo curl -o SimbaJDBCDriverforGoogleBigQuery42_1.2.25.1029.zip https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.25.1029.zip
sudo unzip SimbaJDBCDriverforGoogleBigQuery42_1.2.25.1029.zip -d bigquery
sudo rm SimbaJDBCDriverforGoogleBigQuery42_1.2.25.1029.zip
Installing WebAPI
Because we’re using our own WebAPI drivers, we’ll need to do a custom WebAPI installation. First we’ll copy the downloaded BigQuery drivers into the cloned WebAPI folder.
sudo cp -R bigquery WebAPI
Next, we’ll add the following line to the Dockerfile, to make sure the BigQuery drivers are available during installation. For this you can any text editor you like, e.g. nano: sudo nano Dockerfile
.
COPY /bigquery/ /code/src/main/extras/bigquery/
As described in the WebAPI documentation, we also need to create our own settings.xml
file. For now it’s sufficient to copy the sample_settings.xml
into a new folder.
sudo mkdir WebAPIConfig
sudo cp sample_settings.xml WebAPIConfig/settings.xml
Also, the WebAPI contains a pom.xml
file with different profiles for each possible data source. We’re going to use the webapi-bigquery
profile. But since we downloaded our own drivers, we need to modify the profile to match the version numbers of the downloaded drivers. The following piece of xml should match the correct version numbers. You can copy and paste this into your pom.xml
file.
<profile>
<id>webapi-bigquery</id>
<properties>
<bigquery.enabled>true</bigquery.enabled>
<!-- BigQuery JDBC driver path -->
<bigquery.classpath>${basedir}/src/main/extras/bigquery</bigquery.classpath>
</properties>
<dependencies>
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>v2-rev20221028-2.0.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>2.9.4</version>
</dependency>
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client</artifactId>
<version>1.42.3</version>
</dependency>
<dependency>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client</artifactId>
<version>1.34.1</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquerystorage</artifactId>
<version>2.26.0</version>
</dependency>
<dependency>
<groupId>com.google.auth</groupId>
<artifactId>google-auth-library-oauth2-http</artifactId>
<version>1.12.1</version>
</dependency>
<dependency>
<groupId>com.google.auth</groupId>
<artifactId>google-auth-library-credentials</artifactId>
<version>1.12.1</version>
</dependency>
<dependency>
<groupId>com.google.api</groupId>
<artifactId>gax</artifactId>
<version>2.19.5</version>
</dependency>
<dependency>
<groupId>com.google.api</groupId>
<artifactId>gax-grpc</artifactId>
<version>2.19.5</version>
</dependency>
<dependency>
<groupId>com.google.api.grpc</groupId>
<artifactId>proto-google-cloud-bigquerystorage-v1beta1</artifactId>
<version>0.150.0</version>
</dependency>
<dependency>
<groupId>io.opencensus</groupId>
<artifactId>opencensus-contrib-http-util</artifactId>
<version>0.31.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
<executions>
<execution>
<id>google-api-client</id>
<phase>initialize</phase>
<goals>
<goal>install-file</goal>
</goals>
<configuration>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>2.1.1</version>
<packaging>jar</packaging>
<file>${bigquery.classpath}/google-api-client-2.1.1.jar</file>
</configuration>
</execution>
<execution>
<id>google-api-services-bigquery</id>
<phase>initialize</phase>
<goals>
<goal>install-file</goal>
</goals>
<configuration>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>v2-rev20221028-2.0.0</version>
<packaging>jar</packaging>
<file>${bigquery.classpath}/google-api-services-bigquery-v2-rev20221028-2.0.0.jar</file>
</configuration>
</execution>
<execution>
<id>google-cloud-bigquery</id>
<phase>initialize</phase>
<goals>
<goal>install-file</goal>
</goals>
<configuration>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>2.9.4</version>
<packaging>jar</packaging>
<file>${bigquery.classpath}/GoogleBigQueryJDBC42.jar</file>
</configuration>
</execution>
<execution>
<id>google-http-client</id>
<phase>initialize</phase>
<goals>
<goal>install-file</goal>
</goals>
<configuration>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client</artifactId>
<version>1.42.3</version>
<packaging>jar</packaging>
<file>${bigquery.classpath}/google-http-client-1.42.3.jar</file>
</configuration>
</execution>
<execution>
<id>google-oauth-client</id>
<phase>initialize</phase>
<goals>
<goal>install-file</goal>
</goals>
<configuration>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client</artifactId>
<version>1.34.1</version>
<packaging>jar</packaging>
<file>${bigquery.classpath}/google-oauth-client-1.34.1.jar</file>
</configuration>
</execution>
<execution>
<id>google-cloud-bigquerystorage</id>
<phase>initialize</phase>
<goals>
<goal>install-file</goal>
</goals>
<configuration>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquerystorage</artifactId>
<version>2.26.0</version>
<packaging>jar</packaging>
<file>${bigquery.classpath}/google-cloud-bigquerystorage-2.26.0.jar</file>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
And then we’re ready to build our WebAPI image, using the BigQuery profile, next to the default PostgreSQL profile.
sudo docker build -t broadsea-webapi-bigquery --build-arg MAVEN_PROFILE=webapi-postgresql,webapi-bigquery .
Docker Compose
Now, our WebAPI image is ready to go. However, we still need the rest of the Broadsea ecosystem. We already cloned the Broadsea repository and downloaded the BigQuery drives, so lets start by copying those drivers to the correct folder.
sudo cp -R bigquery Broadsea
Next to the drivers, we also uploaded a Google service account. Lets also move this to our Broadsea folder.
sudo cp service-account.json Broadsea
Now we have the BigQuery drivers and service account available, we need to mount the to our Dockerfile in the docker-compose.yml
. Add the following lines to the broadsea-hades
service:
volumes:
- ./bigquery:/tmp/drivers/:ro
- ./service-account-key.json:/tmp/google/service-account-key.json
In Broadsea 3.0, the WebAPI has its own ohdsi-webapi.yml
file, which is referred to in docker-compose.yml
. Add the following lines to this file:
volumes:
- ./bigquery:/tmp/drivers/:ro
- ./service-account-key.json:/var/lib/ohdsi/webapi/service-account-key.json
Also, since we’re using a custom installation of the WebAPI, we need to mention this in the docker-compose.yml
file, instead of simply retrieving the latest WebAPI version from Docker Hub. Therefore replace image: docker.io/ohdsi/webapi:latest
with image: broadsea-webapi-bigquery
, which is the name we gave to our WebAPI image earlier.
Lastly, the are loads of environment variables which are referred to by ${VARIABLE_NAME}
in the docker-compose file. These variabeles are defined in the .env
file, which is also were you can change them. For example, you can set the username and password of your Hades user.
Now everything is ready to go, lets start those containers!
sudo docker-compose --profile default up -d
Accessing Atlas
When your services are up, Atlas is ready to use. Since we created our container in a private VPC network, we’ll need IAP (identity-aware proxy) to access it. Make sure your user has the correct rights to access the VM via IAP. These permissions can be granted for every VM separately. Just find IAP in Google Cloud, and use the IAP-secured Tunnel User
role.
Now we are ready to create an IAP-tunnel to the http-port to which Atlas is exposed (80) and the port which we use locally (8001), since 80 is usually reserved.
gcloud compute start-iap-tunnel atlas-instance 80 --local-host-port=localhost:8001 --zone=europe-west4-a
Adding a new data source
Now we have Atlas up and running, but we haven’t added our own data source yet. By default the Broadsea version of Atlas contains a demo dataset (Eunomia), but this isn’t what we’re looking for.
Results schema
Earlier in this post we mentionted the creation of a results
dataset in BigQuery. However, we did not create any tables yet to hold the data. Now the WebAPI is active, we can use the following url to generate the queries to create the correct tables. Make sure your IAP-tunnel is active, and you check if the dataset names match those in your BigQuery environment.
http://127.0.0.1:8001/WebAPI/ddl/results?dialect=bigquery&schema=results&vocabSchema=cdm&tempSchema=temp&initConceptHierarchy=true
The resulting queries can be ran in BigQuery.
Insert queries
To add a data source, we need to tell the WebAPI where to find it. This can be done by adding new records to the webapi.source
and the webapi.source_daimon
tables, which reside in the broadsea-atlasdb
container. How you can add records to these tables is also described in the WebAPI Wiki, but we’ll give an example for BigQuery.
First, we need to enter the broadsea-atlasdb
container.
sudo docker exec -it broadsea-atlasdb /bin/bash
If we’re in, we can enter the internal PostgreSQL database. This is just a database for internal use, and differs from the BigQuery database which hosts your actual OMOP data.
psql --host localhost --username postgres
We’ll start by adding a new record to webapi.source
. This contains the name and id of your data source, but most importantly a connection string for BigQuery. Make sure to check this connection string, and adjust the name of the GCP-project and service account to match your own set-up.
-- OHDSI CDM source
INSERT INTO webapi.source( source_id, source_name, source_key, source_connection, source_dialect, is_cache_enabled)
VALUES (2, 'NewCDM', 'cdm',
'jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=new-project;OAuthType=0;OAuthServiceAcctEmail=service-account@new-project.iam.gserviceaccount.com;OAuthPvtKeyPath=/var/lib/ohdsi/webapi/service-account-key.json;', 'bigquery', 't');
Next, we’ll add four records to webapi.source_daimon
table. These contain the different BigQuery datasets which we mentioned at the beginning of this post. These datasets will all contain different types of data needed to run Atlas. Make sure the source_id
in both tables is the same.
-- CDM daimon
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (4, 2, 0, 'cdm', 0);
-- VOCABULARY daimon
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (5, 2, 1, 'cdm', 10);
-- RESULTS daimon
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (6, 2, 2, 'results', 0);
-- TEMP daimon
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (7, 2, 5, 'temp', 0);
Finally, let’s restart the WebAPI, so the changes can take effect.
sudo docker restart ohdsi-webapi
Update record counts
When you open Atlas again, you should now see your own data source available under data sources. However, if you’ll search for several concept_id
’s present in you data, you’ll notice that most of them are still empty. Refreshing the record counts is done in a view steps, which we’ll discuss below.
Achilles
Most of the work is done by Achilles, which is an R package developed by OHDSI. To use the Achilles packages, we first need to establish a connection to BigQuery, using the DatabaseConnector package.
connection_details <- DatabaseConnector::createConnectionDetails(
dbms = "bigquery",
connectionString = "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=new-project;OAuthType=0;OAuthServiceAcctEmail=service-account@new-project.iam.gserviceaccount.com;OAuthPvtKeyPath=/var/lib/ohdsi/webapi/service-account-key.json;Timeout=1000",
user = "",
password = "",
pathToDriver = "/tmp/drivers/bigquqery/"
)
connection <- DatabaseConnector::connect(connection_details)
When the database connection is established, we can update all relevant tables using the achilles()
function. This function can be executed directly on the database, or ran in sqlOnly
mode, which returns and set of queries to run manually. At the moment of writing however, the Achilles packages doesn’t seem to be fully developed for BigQuery, and still has some syntactical issues. Therefore, we’re going to use the sqlOnly
mode, and make some manual adjustments afterwards.
Achilles::achilles(
connectionDetails = connection_details,
cdmDatabaseSchema = "cdm",
resultsDatabaseSchema = "results",
scratchDatabaseSchema = "temp",
vocabDatabaseSchema = "cdm",
sourceName = "newCDM",
createTable = FALSE,
defaultAnalysesOnly = FALSE,
dropScratchTables = TRUE,
sqlOnly = TRUE,
sqlDialect = "bigquery",
optimizeAtlasCache = TRUE,
cdmVersion = "5.4"
)
By default, the achilles()
function now saves an achilles.sql
file in the output
folder of your current working directory. If you read this query into an R object, you can do some fixed programatically, which will save you a lot of time. We’ve used something like here, but this can differ based on your data and Achilles version.
library(readr)
library(stringr)
query <- readr::read_file("output/achilles.sql")
# Add drop table statements
create_temp_table <- str_extract_all(query, "CREATE TABLE temp.tmpach_[a-z0-9_]+")[[1]]
create_results_table <- str_extract_all(query, "CREATE TABLE results.xrhqxw6s[a-z0-9_]+")[[1]]
create_temp_table <- c(create_temp_table, create_results_table)
for (create in create_temp_table) {
table_name <- gsub("CREATE TABLE ", "", create)
query <- gsub(
paste0(create, "[^0-9]"),
paste0("DROP TABLE IF EXISTS ", table_name, "; ", create),
query
)
}
# Fix incorrect data type
query <- gsub("numeric\\(7,6\\)", "numeric", query)
# Fix too complex query
tmpach_2004 <- paste0(
"CREATE TABLE temp.tmpach_2004",
gsub(";.*", "", gsub(".*CREATE TABLE temp.tmpach_2004", "", query)),
";"
)
tmpach_2004 <- gsub("WITH rawdata as \\(", "", tmpach_2004)
tmpach_2004 <- gsub("SELECT \\* FROM rawdata", "", tmpach_2004)
tmpach_2004 <- gsub(" union all", "; insert into temp.tmpach_2004", tmpach_2004)
query <- paste0(
gsub("CREATE TABLE temp.tmpach_2004.*", "", query),
tmpach_2004,
"-- 210",
gsub(".*\\-\\- 210", "", query)
)
# Fix intersect
query <- gsub(" intersect", " intersect distinct", query)
write_file(query, "achilles_new.sql")
Now run the resulting queries via the BigQuery interface. If there are still some bugs left, you’ll spot them soon enough.
Refresh script
Achilles among others creates a achilles_result_concept_count
table. However, Achilles does only create a record_count
, and not a person_count
, which you’ll need for some Atlas functionality. Therefore, we need to separately refresh this table using this query provided by the WebAPI.
Too bad this query is written in PostgreSQL. However, we’ve made an attempt to translate it.
/************************************************/
/***** Create record and person count table *****/
/************************************************/
DROP TABLE IF EXISTS results.achilles_result_concept_count;
CREATE TABLE results.achilles_result_concept_count
(
concept_id int,
record_count bigint,
descendant_record_count bigint,
person_count bigint,
descendant_person_count bigint
);
/**********************************************/
/***** Populate record/person count table *****/
/**********************************************/
INSERT INTO results.achilles_result_concept_count (concept_id, record_count, descendant_record_count, person_count, descendant_person_count)
WITH counts AS (
SELECT stratum_1 concept_id, MAX (count_value) agg_count_value
FROM results.achilles_results
WHERE analysis_id IN (2, 4, 5, 201, 225, 301, 325, 401, 425, 501, 505, 525, 601, 625, 701, 725, 801, 825,
826, 827, 901, 1001, 1201, 1203, 1425, 1801, 1825, 1826, 1827, 2101, 2125, 2301)
/* analyses:
Number of persons by gender
Number of persons by race
Number of persons by ethnicity
Number of visit occurrence records, by visit_concept_id
Number of visit_occurrence records, by visit_source_concept_id
Number of providers by specialty concept_id
Number of provider records, by specialty_source_concept_id
Number of condition occurrence records, by condition_concept_id
Number of condition_occurrence records, by condition_source_concept_id
Number of records of death, by cause_concept_id
Number of death records, by death_type_concept_id
Number of death records, by cause_source_concept_id
Number of procedure occurrence records, by procedure_concept_id
Number of procedure_occurrence records, by procedure_source_concept_id
Number of drug exposure records, by drug_concept_id
Number of drug_exposure records, by drug_source_concept_id
Number of observation occurrence records, by observation_concept_id
Number of observation records, by observation_source_concept_id
Number of observation records, by value_as_concept_id
Number of observation records, by unit_concept_id
Number of drug era records, by drug_concept_id
Number of condition era records, by condition_concept_id
Number of visits by place of service
Number of visit_occurrence records, by discharge_to_concept_id
Number of payer_plan_period records, by payer_source_concept_id
Number of measurement occurrence records, by observation_concept_id
Number of measurement records, by measurement_source_concept_id
Number of measurement records, by value_as_concept_id
Number of measurement records, by unit_concept_id
Number of device exposure records, by device_concept_id
Number of device_exposure records, by device_source_concept_id
Number of location records, by region_concept_id
*/
GROUP BY stratum_1
UNION ALL
SELECT stratum_2 concept_id, SUM (count_value) AS agg_count_value
FROM results.achilles_results
WHERE analysis_id IN (405, 605, 705, 805, 807, 1805, 1807, 2105)
/* analyses:
Number of condition occurrence records, by condition_concept_id by condition_type_concept_id
Number of procedure occurrence records, by procedure_concept_id by procedure_type_concept_id
Number of drug exposure records, by drug_concept_id by drug_type_concept_id
Number of observation occurrence records, by observation_concept_id by observation_type_concept_id
Number of observation occurrence records, by observation_concept_id and unit_concept_id
Number of observation occurrence records, by measurement_concept_id by measurement_type_concept_id
Number of measurement occurrence records, by measurement_concept_id and unit_concept_id
Number of device exposure records, by device_concept_id by device_type_concept_id
but this subquery only gets the type or unit concept_ids, i.e., stratum_2
*/
GROUP BY stratum_2
), counts_person AS (
SELECT stratum_1 as concept_id, MAX (count_value) agg_count_value
FROM results.achilles_results
WHERE analysis_id IN (200, 240, 400, 440, 540, 600, 640, 700, 740, 800, 840, 900, 1000, 1300, 1340, 1800, 1840, 2100, 2140, 2200)
/* analyses:
Number of persons with at least one visit occurrence, by visit_concept_id
Number of persons with at least one visit occurrence, by visit_source_concept_id
Number of persons with at least one condition occurrence, by condition_concept_id
Number of persons with at least one condition occurrence, by condition_source_concept_id
Number of persons with death, by cause_source_concept_id
Number of persons with at least one procedure occurrence, by procedure_concept_id
Number of persons with at least one procedure occurrence, by procedure_source_concept_id
Number of persons with at least one drug exposure, by drug_concept_id
Number of persons with at least one drug exposure, by drug_source_concept_id
Number of persons with at least one observation occurrence, by observation_concept_id
Number of persons with at least one observation occurrence, by observation_source_concept_id
Number of persons with at least one drug era, by drug_concept_id
Number of persons with at least one condition era, by condition_concept_id
Number of persons with at least one visit detail, by visit_detail_concept_id
Number of persons with at least one visit detail, by visit_detail_source_concept_id
Number of persons with at least one measurement occurrence, by measurement_concept_id
Number of persons with at least one measurement occurrence, by measurement_source_concept_id
Number of persons with at least one device exposure, by device_concept_id
Number of persons with at least one device exposure, by device_source_concept_id
Number of persons with at least one note by note_type_concept_id
*/
GROUP BY stratum_1
), concepts AS (
select concept_id as ancestor_id, coalesce(cast(ca.descendant_concept_id as STRING), concept_id) as descendant_id
from (
select concept_id from counts
UNION DISTINCT
-- include any ancestor concept that has a descendant in counts
select distinct cast(ancestor_concept_id as STRING) concept_id
from counts c
join cdm_pulse_01.concept_ancestor ca on cast(ca.descendant_concept_id as STRING) = c.concept_id
) c
left join cdm_pulse_01.concept_ancestor ca on c.concept_id = cast(ca.ancestor_concept_id as STRING)
)
SELECT
cast(concepts.ancestor_id as int) concept_id,
coalesce(max(c1.agg_count_value), 0) record_count,
coalesce(sum(c2.agg_count_value), 0) descendant_record_count,
coalesce(max(c3.agg_count_value), 0) person_count,
coalesce(sum(c4.agg_count_value), 0) descendant_person_count
FROM concepts
LEFT JOIN counts c1 ON concepts.ancestor_id = c1.concept_id
LEFT JOIN counts c2 ON concepts.descendant_id = c2.concept_id
LEFT JOIN counts_person c3 ON concepts.ancestor_id = c3.concept_id
LEFT JOIN counts_person c4 ON concepts.descendant_id = c4.concept_id
GROUP BY concepts.ancestor_id;
Cache
No, we’re not done yet! There’s still cache. Atlas does not retrieve the record counts directly from achilles_result_concept_count
, but uses a caching table. To update the record counts, we need to truncate the caching table, after which it will be automatically filled with the new record counts. The cache resides in the webapi.cdm_cache
table in the broadsea-atlasdb
container.
sudo docker exec -it broadsea-atlasdb /bin/bash
psql --host localhost --username postgres
truncate table webapi.cdm_cache;
Debugging
When everything works at once, it’s great. Usually it doesn’t. Atlas depends greatly on the WebAPI. When something isn’t working as it is supposed to, the WebAPI logs are typically a good place to start searching.
sudo docker logs ohdsi-webapi
Notes
Feel free to check our forks of the Broadsea and WebAPI repositories:
Also, feel free to reach out via mail if you have any questions: contact details