For Researchers

Individual-level data access information

INDIVIDUAL-LEVEL DATA ACCESS INFORMATION

*Note: this webpage covers individual-level/ restricted access data. PGC GWAS summary statistics are available without restriction on the Download Results webpage. 

 

The PGC has established a large collection of individual-level genetic and phenotypic data, which is accessible to investigators for Secondary Analysis Proposals, via application to the relevant PGC Workgroup(s). These data are stored on the Snellius server hosted by SURFsara which is located in the Netherlands. The PGC is committed to data sharing to engage more research groups, maximize knowledge, and accelerate scientific progress, while respecting the limits of national laws and ethical restrictions. Here, we provide an overview of the individual-level data available, the application process, and answers to some FAQs. The data access infrastructure is managed by the PGC Data Access Committee, which includes representatives from each PGC Disorder Workgroup.

Data available

  • At present, 11 PGC Disorder Workgroups have collections of individual-level genetic data per cohort stored on the Snellius server and available for Secondary Analysis Proposals. Permission to access to each disorder collection is granted by the relevant PGC Disorder Workgroup.
  • Genetic data available per cohort includes raw genotype data, QC’ed genotype data, imputed dosage data, imputed dosages converted to hard calls, and principal components of genetic ancestry. Files are provided in RICOPILI format. Some Disorder Workgroups have additional types of genetic data available, such as pre-calculated polygenic risk scores, imputed HLA or C4 data.
  • Phenotypic data available for all cohorts includes case-control status. Some Disorder Workgroups have more detailed subphenotype information available. 
  • Within each PGC disorder, the data for most cohorts are accessible through a “fast-track” data package, which is accessible once the Workgroup approves the Secondary Analysis Proposal. However, some cohorts require explicit permission from the Principal Investigator of the cohort or from a data repository such as dbGAP. If investigators wish to access these cohorts on Snellius, they will need to secure the special permissions required.
  • The latest GWAS publication from the PGC Disorder Workgroup typically contains information about the specific cohorts and sample sizes available in the Supplementary Materials. For specific questions about the individual-level data available on Snellius  and the permissions required, please contact the relevant PGC Disorder Workgroup Chair(s) and the Data Access Committee Representative (DAC Rep).

Overview of data application process

 Here PGC Data Access Committee Co-chair Lea Davis describes the process in a YouTube video. 

 

  1. The applicant develops a Secondary Analysis Proposal and submits it to the relevant PGC Workgroup(s), by emailing it to the Workgroup Chair(s) and Data Access Committee (DAC) Representative(s). This includes the study rationale, analytic plans, data being requested, individuals involved in the work, timeline, and publication plans. Please ensure sufficient information is included for members of the Workgroup to review the proposed project.
  2. The Secondary Analysis Proposal will be circulated to the Workgroup mailing list for review by all members. The Workgroup Chair(s) and DAC Representative(s) will advise on whether a short presentation of the Secondary Analysis Proposal is required on a Workgroup call. Once the Secondary Analysis Proposal is approved by the Workgroup(s), the Workgroup Chair(s) or DAC Representative(s) will send you an  approval email.
  3. After the Secondary Analysis Proposal is approved, the analyst(s) named on the proposal will obtain an account on the Snellius server, where the individual-level data are stored. Prior to applying for a Snellius account, please review this informational sheet. To obtain a Snellius account, follow these instructions. Approval typically takes a few days.
  4. Apply for data access via the relevant PGC Data Access Portal(s). You will need your Snellius username, and several documents such as a copy of your Secondary Analysis Proposal, a copy of your Workgroup approval email and documentation for any specific permissions required such as dbGaP. Detailed information about all of the documents required is provided below.
  5. Your data access request through the portal will be reviewed by the relevant DAC Representative(s)  and once approved the Snellius administrators will add your username to the relevant unix permissions groups. Note: Individual-level genetic data may never be downloaded from Snellius.
  6. Secondary Analysis Proposals, Snellius accounts and data access are approved for one year at a time and you must renew all three of these annually. You will receive reminder emails one month before these are due to expire. 
Picture9

Acknowledgements

The PGC is deeply grateful to SURFsara and VU University Amsterdam for their support in using the Dutch National Supercomputer Snellius.

Documents required for data access

Always required:

  1. Secondary Analysis Proposal - Please use the template provided and ensure sufficient information is included for members of the Workgroup to review the proposed project and make a recommendation. 
  2. Workgroup Approval Email(s) - This will be provided to you by the Workgroup Chair(s) or Data Access Committee (DAC) Representative(s)
  3. Signed PGC Analyst Memo - We request that everyone who accesses individual-level PGC data (and their supervisor if applicable) confirms their agreement with the PGC Memorandum of Understanding (MOU) and understands the obligations and benefits of participating. Please read the MOU carefully and follow all of the steps.
  4. Signed WTCCC Data Access Agreement - This is required for accessing cohorts from the Wellcome Trust Case Control Consortium.

 

May be required depending on the cohorts you are applying for:

  1. Current dbGaP Approval - Some PGC cohorts were submitted to the NIH dbGaP Repository by their Principal Investigators and have been gathered together into the PGC dbGaP collection. The PGC can only provide access to these cohorts on Snellius to investigators with an approved and currently valid dbGaP application for these data. This means there’s no need for you to download the data from dbGaP and format the files yourself. This dbGaP application can be used as a template for your own. Note, the Principal Investigator of the lab should apply, as their application covers everyone who reports to them. Multiple PIs at the same institute can be added as collaborators on one dbGaP application. Collaborators at different institutes need their own dbGaP application, since each application must be signed by an Institutional Signing Official. Your approval from dbGaP is valid for one year and it must be renewed via dbGaP annually. You must upload a valid dbGaP approval via the PGC DAC portal annually to maintain access to the cohorts in the PGC dbGaP collection on Snellius.
  2. PI approval emails - Some PGC cohorts require explicit permission from the Principal Investigator of the cohort for access. You will need a copy of the approval email from the cohort PI to access these.
  3. HRC reference panel approval - Most PGC datasets are imputed to the Haplotype Reference Consortium (HRC) reference panel, you need HRC access to run GWAS using RICOPILI, and we use it as a reference panel for many post-GWAS analyses. The PGC can provide access to the RICOPILI-formatted version of the HRC on Snellius to investigators with approved HRC access applications from the Sanger institute. Here you can request access to the HRC using its EGA Dataset (EGAD) identifier EGAD00001002729. Each individual who accesses the data needs to be named on this application, so you may want to add names of your trainees at your institute. It's possible to add people on later by emailing datasharing@sanger.ac.uk.

FAQ

Are there authorship requirements for using individual-level PGC data?

Yes, requirements may include individual named authorship for a number of investigators from each cohort analyzed or including a PGC Disorder Workgroup consortium byline, depending on your project. PGC Disorder Workgroups typically have an authorship policy for different types of manuscripts which is available on request from the Chair(s) of the relevant PGC Workgroup(s).

 

How long will it take to have my Secondary Analysis Proposal reviewed by the relevant PGC Workgroup(s)?

Typically Secondary Analysis Proposals are reviewed within three weeks. It may take longer if the Workgroup requires a short presentation of the Secondary Analysis Proposal on a Workgroup call, as these typically occur monthly.

 

Once my Secondary Analysis Proposal is approved, can I expand my project or use the data for another project?

No, Secondary Analysis Proposals are only for the specific project described. If you wish to use the data for another project, you will need to submit another Secondary Analysis Proposal to the relevant PGC Disorder Workgroup(s). We realize that analytic strategies can sometimes change through the course of analysis. If you find that your plan has significantly shifted course, you should speak with your Data Access Committee Representative (DAC Rep) to determine if the project has changed enough to be considered a new project.

 

I have specific questions about the process; who do I contact?

Please contact the Data Access Committee Representative (DAC Rep) from the relevant PGC Disorder Workgroup(s).

Data Access Committee

The goal of the PGC Data Access Committee (DAC) is to maintain simple, efficient and secure procedures for investigators to access PGC data. Each PGC Disorder Workgroup has a Data Access Committee Representative (DAC Rep). The DAC Rep reviews data access applications to ensure all requirements are met, corresponds with the data owner as needed, and curates the data on Snellius.

Data Receiving Committee Chairs

Lea_Davis_Ph.D.

Lea Davis, PhD

Icahn School of Medicine at Mount Sinai

PeVa_IMG_8009_Vu_Communicatie_OAJ_Danielle_Posthuma

Danielle Posthuma, PhD

VU University Amsterdam

Amsterdam University Medical Centre

Stephan-Ripke

Stephan Ripke, MD, PhD

Charité – Universitätsmedizin Berlin

Massachusetts General Hospital

Niamh-Mullins

Niamh Mullins, PhD

Icahn School of Medicine at Mount Sinai

Data Access Committee Core Personnel

Benjamin_Bravo

Benjamin Bravo

Data Manager

Icahn School of Medicine at Mount Sinai

Marissa_Wirth

Marissa Wirth

Project Manager

Icahn School of Medicine at Mount Sinai

Walter_pirovano

Walter Pirovano

Snellius Liaison

VU University Amsterdam

Swapnil _Awasthi

Swapnil Awasthi

Data Management Support

Charité – Universitätsmedizin Berlin

Alice_Braun

Alice Braun

Data Management Support

Charité – Universitätsmedizin Berlin

Data Access Committee Representatives (DAC Reps)

Working Group NameAbbreviationDAC Representative(s)Contact email(s)
Attention Deficit Hyperactivity DisorderADHDMarieke Kleinpgc.dac.add@gmail.com
Alzheimer's Disease ALZShahram Bahramipgc.dac.alz@gmail.com
Autism Spectrum DisorderASDJakob Grovepgc.dac.aut@gmail.com
Bipolar DisorderBDMaria Korominapgc.dac.bip@gmail.com
Eating DisordersEDKaren Mitchellpgc.dac.ano@gmail.com
Functional GenomicsFGENLaurence Nisbetl.nisbet-5@sms.ed.ac.uk
Major Depressive DisorderMDDBrittany Mitchellpgc.dac.mdd@gmail.com
Obsessive Compulsive DisorderOCDDirk Smitpgc.dac.ocd@gmail.com
Posttraumatic Stress DisorderPTSDAdam Maihoferpgc.dac.pts@gmail.com
SchizophreniaSCZJulia Kraftpgc.dac.scz1@gmail.com
Substance Use DisordersSUDRaymond Walterspgc.dac.sud@gmail.com
SuicideSUIMarie Gainepgc.dac.sui@gmail.com
Tourette SyndromeTSJeremiah Scharf, Dongmei Yupgc.dac.toc@gmail.com
Antidepressant ResponseADRChris Lochris.lowh@kcl.ac.uk