PRS

mkdir -p materials/gwas materials/pgs materials/ld_ref tools LiftOver PRS
├── tools
│   ├── plink.zip
│   ├── plink2.zip
│   ├── liftOver
│   ├── PRSice_linux.zip
│   ├── PRScs
│   |   └── ...
│   └── PRScsx
│       └── ...
├── materials
│   ├── gwas
│   |   └── 
│   ├── pgs
│   |   └── 
│   └── ld_ref
│       └── 
├── demo_hg19.bed
├── demo_hg19.bim
├── demo_hg19.fam
├── demo_hg38_imput.bed
├── demo_hg38_imput.bim
├── demo_hg38_imput.fam
├── demo.phecov
├── LiftOver
├── PRS
└── ...

Tools

cd tools

LiftOver

Main tools for quality control and association tests

wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver

PRSice2

wget https://github.com/choishingwan/PRSice/releases/download/2.3.5/PRSice_linux.zip
unzip PRSice_linux.zip -d PRSice2

PRSice2/PRSice_linux

PRS-CS and PRS-CSx

git clone https://github.com/getian107/PRScs.git
git clone https://github.com/getian107/PRScsx.git

pip install numpy scipy h5py

Materials

LiftOver chain file

Download the chain file to convert genomic coordinates between different genome assemblies

cd ../materials
# chain file for converting hg38 to hg19
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz

# chain file for converting hg19 to hg38
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz

Reference LD

cd ld_ref

wget -O ldblk_1kg_eas.tar.gz https://www.dropbox.com/s/7ek4lwwf2b7f749/ldblk_1kg_eas.tar.gz?dl=0
wget -O ldblk_1kg_eur.tar.gz https://www.dropbox.com/s/mt6var0z96vb6fv/ldblk_1kg_eur.tar.gz?dl=0
wget -O snpinfo_mult_1kg_hm3 https://www.dropbox.com/s/rhi806sstvppzzz/snpinfo_mult_1kg_hm3?dl=0 # necessary for PRS-CSx  

tar -zxvf ldblk_1kg_eas.tar.gz
tar -zxvf ldblk_1kg_eur.tar.gz 

cd ..

GWAS Catalog and PheWeb

cd gwas

wget https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90428001-GCST90429000/GCST90428597/harmonised/GCST90428597.h.tsv.gz

wget -O phenocode-20002_1466.tsv.gz https://pheweb.org/UKB-Neale/download/20002_1466 # from UKB PheWeb
wget -O gout_ckb.tsv.zip https://pheweb.ckbiobank.org/download/m10 # from CKB PheWeb

gunzip GCST90428597.h.tsv.gz
gunzip phenocode-20002_1466.tsv.gz
unzip -P [] gout_ckb.tsv.zip # m10.tsv

cd ..
GCST90428597.h.tsv
chromosome base_pair_location effect_allele other_allele beta standard_error effect_allele_frequency p_value rsid rs_id n hm_coordinate_conversion hm_code variant_id
1 817186 A G 0.0627 0.0546 0.8534 0.2508 rs3094315 rs3094315 17054 lo 10 1_817186_G_A
1 818802 G A 0.0391 0.0548 0.706 0.4756 rs3131969 rs3131969 8220 lo 11 1_818802_A_G
1 818812 G A 0.0376 0.0549 0.7064 0.4934 rs3131968 rs3131968 8220 lo 11 1_818812_A_G
...
phenocode-20002_1466.tsv
#chrom pos ref alt rsids nearest_genes pval beta ac
1 693731 A G rs12238997 AL669831.1 0.12 0.00073 78182.9
1 717587 G A rs144155419 AL669831.1 0.42 -0.001 10567.6
1 730087 T C rs148120343 AL669831.1 0.53 0.00042 38079.4
...
m10.tsv
variant_id chromosome base_pair_location other_allele effect_allele p_value effect_allele_frequency beta standard_error

PGS Catalog

Download the variant weights for a disease from PGS Catalog

cd pgs

wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS002030/ScoringFiles/Harmonized/PGS002030_hmPOS_GRCh38.txt.gz

gunzip -c PGS002030_hmPOS_GRCh38.txt.gz > PGS002030_hmPOS_GRCh38.txt
grep -v "#" PGS002030_hmPOS_GRCh38.txt > tmp && mv tmp PGS002030_hmPOS_GRCh38.txt

cd ..
PGS002030_hmPOS_GRCh38.txt
rsID chr_name chr_position effect_allele other_allele effect_weight hm_source hm_rsID hm_chr hm_pos hm_inferOtherAllele
rs6671356 1 1040026 C T 1.76815822071497e-08 ENSEMBL rs6671356 1 1104646
rs6604968 1 1041700 G A 1.41265515757531e-08 ENSEMBL rs6604968 1 1106320
rs12726255 1 1049950 G A 1.40560016472365e-08 ENSEMBL rs12726255 1 1114570
...

In this file, the rsID, chr_name, and chr_position fields are derived from GRCh37, while the hm_rsID, hm_chr, and hm_pos fields correspond to GRCh38.

LiftOver

cd LiftOver

awk '{print "chr"$1,$4-1,$4,$2}' demo_hg38_imput.bim > demo_imput_hg38.bed

../tools/liftOver demo_imput_hg38.bed ../materials/hg38ToHg19.over.chain.gz demo_imput_hg19.bed unmapped.txt

awk '{gsub("chr", "", $1); print $4, $1, $3}' demo_imput_hg19.bed > hg38tohg19.txt
grep -v "#" unmapped.txt | awk '{print $4}' > hg38tohg19_rm.txt

cd ..

plink --bfile demo_hg38_imput --allow-extra-chr --exclude LiftOver/hg38tohg19_rm.txt --update-chr LiftOver/hg38tohg19.txt 2 1 --update-map LiftOver/hg38tohg19.txt 3 1 --chr 1-22 --make-bed --out demo_hg19_imput
demo_imput_hg38.bed
chr1 789894 789895 rs111203396
chr1 789941 789942 rs567893021
chr1 790098 790099 rs371966738
...
demo_imput_hg38.bed
chr1 725274 725275 rs111203396
chr1 725321 725322 rs567893021
chr1 725478 725479 rs371966738
...
unmapped.txt
#Deleted in new
chr1 1649313 1649314 rs1198088331
#Deleted in new
chr1 1649379 1649380 rs1383679870
#Deleted in new
chr1 1649495 1649496 rs1287624275
...
hg38tohg19.txt
rs111203396 1 725275
rs567893021 1 725322
rs371966738 1 725479
...
hg38tohg19_rm.txt
rs1198088331
rs1383679870
rs1287624275
...
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to demo_hg19_imput.log.
Options in effect:
  --allow-extra-chr
  --bfile demo_hg38_imput
  --chr 1-22
  --exclude LiftOver/hg38tohg19_rm.txt
  --make-bed
  --out demo_hg19_imput
  --update-chr LiftOver/hg38tohg19.txt 2 1
  --update-map LiftOver/hg38tohg19.txt 3 1
   
1031497 MB RAM detected; reserving 515748 MB for main workspace.
16537409 variants loaded from .bim file.
27719 people (0 males, 0 females, 27719 ambiguous) loaded from .fam.
Ambiguous sex IDs written to demo_hg19_imput.nosex .
--update-map: 16496680 values updated.
Warning: Base-pair positions are now unsorted!
--exclude: 16496680 variants remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 27719 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.994176.
16496680 variants and 27719 people pass filters and QC.
Note: No phenotypes present.
--update-chr: 16496680 values updated.
--make-bed to demo_hg19_imput.bed + demo_hg19_imput.bim + demo_hg19_imput.fam
... done.

External Score

plink --bfile demo_hg19_imput --score materials/pgs/PGS002030_hmPOS_GRCh38.txt 1 4 6 --out PRS/demo_hg19_imput_PGS002030

# plink --bfile demo_hg19_imput --score materials/pgs/PGS002030_hmPOS_GRCh38.txt 1 4 6 sum --out PRS/demo_hg19_imput_PGS002030

plink2 --bfile demo_hg19_imput --score materials/pgs/PGS002030_hmPOS_GRCh38.txt 1 4 6 cols=+scoresums --out PRS/demo_hg19_imput_PGS002030
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to PRS/demo_hg19_imput_PGS002030.log.
Options in effect:
  --bfile demo_hg19_imput
  --out PRS/demo_hg19_imput_PGS002030
  --score materials/pgs/PGS002030_hmPOS_GRCh38.txt 1 4 6
 
1031497 MB RAM detected; reserving 515748 MB for main workspace.
16496116 variants loaded from .bim file.
2000 people (0 males, 0 females, 2000 ambiguous) loaded from .fam.
Ambiguous sex IDs written to PRS/demo_hg19_imput_PGS002030.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 2000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.994153.
16496116 variants and 2000 people pass filters and QC.
Note: No phenotypes present.
Warning: 14515 lines skipped in --score file (14439 due to variant ID mismatch,
76 due to allele code mismatch); see PRS/demo_hg19_imput_PGS002030.nopred for
details.
--score: 148696 valid predictors loaded.
--score: Results written to PRS/demo_hg19_imput_PGS002030.profile .

PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to PRS/demo_hg19_imput_PGS002030.log.
Options in effect:
  --bfile demo_hg19_imput
  --out PRS/demo_hg19_imput_PGS002030
  --score materials/pgs/PGS002030_hmPOS_GRCh38.txt 1 4 6 cols=+scoresums
 
Start time: Mon Apr 14 14:53:28 2025
1031497 MiB RAM detected; reserving 515748 MiB for main workspace.
Using up to 128 threads (change this with --threads).
2000 samples (0 females, 0 males, 2000 ambiguous; 2000 founders) loaded from
demo_hg19_imput.fam.
16496116 variants loaded from demo_hg19_imput.bim.
Note: No phenotype data present.
Calculating allele frequencies... done.
Warning: 14439 --score file entries were skipped due to missing variant IDs,
and 76 were skipped due to mismatching allele codes.
(Add the 'list-variants' modifier to see which variants were actually used for
scoring.)
--score: 148696 variants processed.
--score: Results written to PRS/demo_hg19_imput_PGS002030.sscore .
End time: Mon Apr 14 14:53:31 2025
demo_hg19_imput_PGS002030.profile
FID IID PHENO CNT CNT2 SCORE
TWB0001H02 TWB0001H02 -9 296222 115155 -3.62423e-09
2014-01-B08 2014-01-B08 -9 296186 115847 -1.39565e-08
TWB0011F05 TWB0011F05 -9 296418 114938 3.20549e-08
...
demo_hg19_imput_PGS002030.sscore
FID IID ALLELE_CT NAMED_ALLELE_DOSAGE_SUM SCORE1_AVG SCORE1_SUM
TWB0001H02 TWB0001H02 296222 115155 -3.62423e-09 -0.00107782
2014-01-B08 2014-01-B08 296186 115847 -1.39565e-08 -0.00415055
TWB0011F05 TWB0011F05 296418 114938 3.20549e-08 0.00953287
...

Clumping + Thresholding

PRScie2

tools/PRSice2/PRSice_linux \
--target demo_hg19 \
--base materials/gwas/GCST90428597.h.tsv \
--binary-targe T \
--snp rsid \
--A1 effect_allele \
--stat beta \
--pvalue p_value \
--beta \
--pheno demo.phecov \
--pheno-col gout \
--cov demo.phecov \
--cov-col age,sex \
--out PRS/demo_hg19.prsice2

Rscript tools/PRSice2/PRSice \
--prsice tools/PRSice2/PRSice_linux \
--target demo_hg19 \
--base materials/gwas/GCST90428597.h.tsv \
--binary-targe T \
--snp rsid \
--A1 effect_allele \
--stat beta \
--pvalue p_value \
--beta \
--pheno demo.phecov \
--pheno-col gout \
--cov demo.phecov \
--cov-col age,sex \
--out PRS/demo_hg19.prsice2 \
--extract PRS/demo_hg19.prsice2.valid
PRSice 2.3.5 (2021-09-20)
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2025-04-14 15:21:34
tools/PRSice2/PRSice_linux \
    --a1 effect_allele \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base materials/gwas/GCST90428597.h.tsv \
    --beta  \
    --binary-target T \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov demo.phecov \
    --cov-col age,sex \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --out PRS/demo_hg19.prsice2 \
    --pheno demo.phecov \
    --pheno-col gout \
    --pvalue p_value \
    --seed 2013053753 \
    --snp rsid \
    --stat beta \
    --target demo_hg19 \
    --thread 1 \
    --upper 0.5
 
Initializing Genotype file: demo_hg19 (bed)
 
Start processing GCST90428597.h
==================================================
 
Base file: materials/gwas/GCST90428597.h.tsv
Header of file is:
 
chromosome      base_pair_location      effect_allele   other_allele    beta    standard_error  effect_allele_frequency p_value rsid    rs_id   n       hm_coordinate_c nversion       hm_code variant_id
 
Reading 100.00%
8714421 variant(s) observed in base file, with:
 
 Error: A total of 6 duplicated SNP ID detected out of
       8714415 input SNPs! Valid SNP ID (post --extract /
       --exclude, non-duplicated SNPs) stored at
       PRS/demo_hg19.prsice2.valid. You can avoid this
       error by using --extract PRS/demo_hg19.prsice2.valid 
PRSice 2.3.5 (2021-09-20)
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2025-04-14 15:40:31
./tools/PRSice2/PRSice_linux \
    --a1 effect_allele \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base materials/gwas/GCST90428597.h.tsv \
    --beta  \
    --binary-target T \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov demo.phecov \
    --cov-col age,sex \
    --extract PRS/demo_hg19.prsice2.valid \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --out PRS/demo_hg19.prsice2 \
    --pheno demo.phecov \
    --pheno-col gout \
    --pvalue p_value \
    --seed 685772835 \
    --snp rsid \
    --stat beta \
    --target demo_hg19 \
    --thread 1 \
    --upper 0.5
 
Initializing Genotype file: demo_hg19 (bed)
 
Start processing GCST90428597.h
==================================================
 
SNP extraction/exclusion list contains 4 columns, will
assume first column contains the SNP ID
 
Base file: materials/gwas/GCST90428597.h.tsv
Header of file is:
 
chromosome      base_pair_location      effect_allele   other_allele    beta    standard_error  effect_allele_frequency p_value rsid    rs_id   n       hm_coordinate_c nversion       hm_code variant_id
 
Reading 100.00%
8714421 variant(s) observed in base file, with:
12 variant(s) excluded based on user input
8714409 total variant(s) included from base file
 
Loading Genotype info from target
==================================================
 
2000 people (1384 male(s), 616 female(s)) observed
2000 founder(s) included
 
Warning: Currently not support haploid chromosome and sex
         chromosomes
 
28613 variant(s) not found in previous data
59 variant(s) with mismatch information
38747 ambiguous variant(s) excluded
564753 variant(s) included
 
Phenotype file: demo.phecov
Column Name of Sample ID: FID+IID
Note: If the phenotype file does not contain a header, the
column name will be displayed as the Sample ID which is
expected.
 
There are a total of 1 phenotype to process
 
Start performing clumping
 
Clumping Progress: 100.00%
Number of variant(s) after clumping : 96391
 
Processing the 1 th phenotype
 
gout is a binary phenotype
1000 control(s)
1000 case(s)
 
Processing the covariate file: demo.phecov
==============================
 
Include Covariates:
Name    Missing Number of levels
age     0       -
sex     0       -
 
After reading the covariate file, 2000 sample(s) included
in the analysis
 
Start Processing
Processing 100.00%

There are 1 region(s) with p-value less than 1e-5. Please
note that these results are inflated due to the overfitting
inherent in finding the best-fit PRS (but it's still best
to find the best-fit PRS!).
You can use the --perm option (see manual) to calculate an
empirical P-value.

 
Begin plotting
Current Rscript version = 2.3.3
Plotting Bar Plot
Plotting the high resolution plot
demo_hg19.prsice2.prsice
Pheno Set Threshold R2 P Coefficient Standard.Error Num_SNP
- Base 5e-08 0.0204506 1.18029e-10 13.0836 2.03103 41
- Base 5.005e-05 0.0199556 2.09099e-10 34.0981 5.36598 121
...
- Base 0.00010005 0.0209524 7.90434e-11 44.9678 6.91556 159
...
demo_hg19.prsice2.summary
Phenotype Set Threshold PRS.R2 Full.R2 Null.R2 Prevalence Coefficient Standard.Error P Num_SNP
- Base 0.00010005 0.0328111 0.382376 0.361424 - 44.9678 6.91556 7.90434e-11 159
demo_hg19.prsice2.best
FID IID In_Regression PRS
sample1 sample1 Yes -0.00295987187
sample2 sample2 Yes 0.00780391272
sample3 sample3 Yes -0.00158728225
...
demo_hg19.prsice2_BARPLOT_2025-04-14.png

Alt text

Shrinkage

PRS-CS/PRS-CSx

awk 'BEGIN {print "SNP\tA1\tA2\tBETA\tP"} NR > 1 {print $1"\t"$5"\t"$4"\t"$8"\t"$6}' materials/gwas/m10.tsv > PRS/ckb.txt
 
awk 'BEGIN {print "SNP\tA1\tA2\tBETA\tP"} NR > 1 {print $5"\t"$4"\t"$3"\t"$8"\t"$7}' materials/gwas/phenocode-20002_1466.tsv > PRS/ukb.txt
ukb.txt
SNP A1 A2 BETA P
rs12238997 G A 0.00073 0.12
rs144155419 A G -0.001 0.42
rs148120343 C T 0.00042 0.53
...
* running PRS-CS
tools/PRScs/PRScs.py --ref_dir=materials/ld_ref/ldblk_1kg_eas \
--bim_prefix=demo_hg19_imput \
--sst_file=PRS/ckb.txt \
--n_gwas=75956 --seed=1 --out_dir=PRS/prscs
--ref_dir=materials/ld_ref/ldblk_1kg_eas
--bim_prefix=demo_hg19_imput
--sst_file=PRS/ckb.txt
--a=1
--b=0.5
--phi=None
--n_gwas=75956
--n_iter=1000
--n_burnin=500
--thin=5
--out_dir=PRS/prscs_
--chrom=['1']
--beta_std=FALSE
--write_psi=FALSE
--write_pst=FALSE
--seed=1
 
##### process chromosome 1 #####
... parse reference file: materials/ld_ref/ldblk_1kg_eas/snpinfo_1kg_hm3 ...
... 85604 SNPs on chromosome 1 read from materials/ld_ref/ldblk_1kg_eas/snpinfo_1kg_hm3 ...
... parse bim file: demo_hg19_imput.bim ...
... 1294576 SNPs on chromosome 1 read from demo_hg19_imput.bim ...
... parse sumstats file: PRS/ckb.txt ...
... 6850163 SNPs read from PRS/ckb.txt ...
... 78898 common SNPs in the reference, sumstats, and validation set ...
... parse reference LD on chromosome 1 ...
... MCMC ...
--- iter-100 ---
--- iter-200 ---
--- iter-300 ---
--- iter-400 ---
--- iter-500 ---
--- iter-600 ---
--- iter-700 ---
--- iter-800 ---
--- iter-900 ---
--- iter-1000 ---
... Estimated global shrinkage parameter: ...
... Done ...
 
...
 
##### process chromosome 22 #####
... parse reference file: materials/ld_ref/ldblk_1kg_eas/snpinfo_1kg_hm3 ...
... 14968 SNPs on chromosome 22 read from materials/ld_ref/ldblk_1kg_eas/snpinfo_1kg_hm3 ...
... parse bim file: demo_hg19_imput.bim ...
... 213675 SNPs on chromosome 22 read from demo_hg19_imput.bim ...
... parse sumstats file: PRS/ckb.txt ...
... 6850163 SNPs read from PRS/ckb.txt ...
... 12673 common SNPs in the reference, sumstats, and validation set ...
... parse reference LD on chromosome 22 ...
... MCMC ...
--- iter-100 ---
--- iter-200 ---
--- iter-300 ---
--- iter-400 ---
--- iter-500 ---
--- iter-600 ---
--- iter-700 ---
--- iter-800 ---
--- iter-900 ---
--- iter-1000 ---
... Estimated global shrinkage parameter: 9.25e-05 ...
... Done ...
prscs_pst_eff_a1_b0.5_phiauto_chr1.txt
1 rs3131969 754182 G A 1.115942e-06
1 rs1048488 760912 C T -1.175908e-04
1 rs4040617 779322 G A -1.562966e-04
...
cat PRS/prscs_pst_eff_a1_b0.5_phiauto_chr*.txt > PRS/prscs_pst_eff_a1_b0.5_phiauto.txt

plink2 --bfile demo_hg19_imput --score PRS/prscs_pst_eff_a1_b0.5_phiauto.txt 2 4 6 cols=+scoresums --out PRS/demo_hg19_imput_prscs
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to PRS/demo_hg19_imput_prscs.log.
Options in effect:
  --bfile demo_hg19_imput
  --out PRS/demo_hg19_imput_prscs
  --score PRS/prscs_pst_eff_a1_b0.5_phiauto.txt 2 4 6 cols=+scoresums
 
Start time: Tue Apr 15 12:08:51 2025
1031497 MiB RAM detected; reserving 515748 MiB for main workspace.
Using up to 128 threads (change this with --threads).
2000 samples (0 females, 0 males, 2000 ambiguous; 2000 founders) loaded from
demo_hg19_imput.fam.
 
16496116 variants loaded from demo_hg19_imput.bim.
Note: No phenotype data present.
Calculating allele frequencies... done.
--score: 957101 variants processed.
--score: Results written to PRS/demo_hg19_imput_prscs.sscore .
End time: Tue Apr 15 12:08:55 2025
ROC_decile.png

Alt text

mkdir PRS/prscsx

tools/PRScsx/PRScsx.py --ref_dir=materials/ld_ref \
--bim_prefix=demo_hg19_imput \
--sst_file=PRS/ckb.txt,PRS/ukb.txt \
--n_gwas=75956,337159 --pop=EAS,EUR --seed=1 --meta=TRUE --out_dir=PRS/prscsx --out_name=demo
--ref_dir=materials/ld_ref
--bim_prefix=demo_hg19_imput
--sst_file=['PRS/ckb.txt', 'PRS/ukb.txt']
--a=1
--b=0.5
--phi=None
--n_gwas=[75956, 337159]
--pop=['EAS', 'EUR']
--n_iter=2000
--n_burnin=1000
--thin=5
--out_dir=PRS/prscsx
--out_name=demo
--chrom=['1', '2', '3', '4', '5']
--meta=TRUE
--write_pst=FALSE
--seed=1
 
*** 2 discovery populations detected ***
 
##### process chromosome 1 #####
... parse reference file: materials/ld_ref/snpinfo_mult_1kg_hm3 ...
... 109168 SNPs on chromosome 1 read from materials/ld_ref/snpinfo_mult_1kg_hm3 ...
... parse bim file: demo_hg19_imput.bim ...
... 1294576 SNPs on chromosome 1 read from demo_hg19_imput.bim ...
... parse EAS sumstats file: PRS/ckb.txt ...
... 6850163 SNPs read from PRS/ckb.txt ...
... 78898 common SNPs in the EAS reference, EAS sumstats, and validation set ...
... parse EUR sumstats file: PRS/ukb.txt ...
... 10894596 SNPs read from PRS/ukb.txt ...
... 82349 common SNPs in the EUR reference, EUR sumstats, and validation set ...
... parse EAS reference LD on chromosome 1 ...
... parse EUR reference LD on chromosome 1 ...
... align reference LD on chromosome 1 across populations ...
... 85336 valid SNPs across populations ...
... MCMC ...
--- iter-100 ---
--- iter-200 ---
--- iter-300 ---
--- iter-400 ---
--- iter-500 ---
--- iter-600 ---
--- iter-700 ---
--- iter-800 ---
--- iter-900 ---
--- iter-1000 ---
--- iter-1100 ---
--- iter-1200 ---
--- iter-1300 ---
--- iter-1400 ---
--- iter-1500 ---
--- iter-1600 ---
--- iter-1700 ---
--- iter-1800 ---
--- iter-1900 ---
--- iter-2000 ---
... Estimated global shrinkage parameter: 3.29e-05 ...
... Done ...
 
...
 
##### process chromosome 22 #####
... parse reference file: materials/ld_ref/snpinfo_mult_1kg_hm3 ...
... 18944 SNPs on chromosome 22 read from materials/ld_ref/snpinfo_mult_1kg_hm3 ...
... parse bim file: demo_hg19_imput.bim ...
... 213675 SNPs on chromosome 22 read from demo_hg19_imput.bim ...
... parse EAS sumstats file: PRS/ckb.txt ...
... 6850163 SNPs read from PRS/ckb.txt ...
... 12673 common SNPs in the EAS reference, EAS sumstats, and validation set ...
... parse EUR sumstats file: PRS/ukb.txt ...
... 10894596 SNPs read from PRS/ukb.txt ...
... 13399 common SNPs in the EUR reference, EUR sumstats, and validation set ...
... parse EAS reference LD on chromosome 22 ...
... parse EUR reference LD on chromosome 22 ...
... align reference LD on chromosome 22 across populations ...
... 13861 valid SNPs across populations ...
... MCMC ...
--- iter-100 ---
--- iter-200 ---
--- iter-300 ---
--- iter-400 ---
--- iter-500 ---
--- iter-600 ---
--- iter-700 ---
--- iter-800 ---
--- iter-900 ---
--- iter-1000 ---
--- iter-1100 ---
--- iter-1200 ---
--- iter-1300 ---
--- iter-1400 ---
--- iter-1500 ---
--- iter-1600 ---
--- iter-1700 ---
--- iter-1800 ---
--- iter-1900 ---
--- iter-2000 ---
... Estimated global shrinkage parameter: 2.48e-05 ...
... Done ...
ls PRS/prscsx
demo_EAS_pst_eff_a1_b0.5_phiauto_chr10.txt
...
demo_EAS_pst_eff_a1_b0.5_phiauto_chr9.txt
demo_EUR_pst_eff_a1_b0.5_phiauto_chr10.txt
...
demo_EUR_pst_eff_a1_b0.5_phiauto_chr9.txt
demo_META_pst_eff_a1_b0.5_phiauto_chr10.txt
...
demo_META_pst_eff_a1_b0.5_phiauto_chr9.txt
prscs_pst_eff_a1_b0.5_phiauto_chr1.txt
1 rs3131969 754182 G A 1.115942e-06
1 rs1048488 760912 C T -1.175908e-04
1 rs4040617 779322 G A -1.562966e-04
...
cat PRS/prscsx/demo_EAS_pst_eff_a1_b0.5_phiauto_chr*.txt > PRS/prscsx_pst_eff_a1_b0.5_phiauto_EAS.txt
cat PRS/prscsx/demo_EUR_pst_eff_a1_b0.5_phiauto_chr*.txt > PRS/prscsx_pst_eff_a1_b0.5_phiauto_EUR.txt
cat PRS/prscsx/demo_META_pst_eff_a1_b0.5_phiauto_chr*.txt > PRS/prscsx_pst_eff_a1_b0.5_phiauto_META.txt

wc -l PRS/prscsx_pst_eff_a1_b0.5_phiauto_EAS.txt # 957101 variants
wc -l PRS/prscsx_pst_eff_a1_b0.5_phiauto_EUR.txt # 1000352 variants
wc -l PRS/prscsx_pst_eff_a1_b0.5_phiauto_META.txt # 1035617 variants

plink2 --bfile demo_hg19_imput --score PRS/prscsx_pst_eff_a1_b0.5_phiauto_META.txt 2 4 6 cols=+scoresums --out PRS/demo_hg19_imput_prscsx_meta
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to PRS/demo_hg19_imput_prscsx_meta.log.
Options in effect:
  --bfile demo_hg19_imput
  --out PRS/demo_hg19_imput_prscsx_meta
  --score PRS/prscsx_pst_eff_a1_b0.5_phiauto_META.txt 2 4 6 cols=+scoresums
 
Start time: Wed Apr 16 14:22:52 2025
1031497 MiB RAM detected; reserving 515748 MiB for main workspace.
Using up to 128 threads (change this with --threads).
2000 samples (0 females, 0 males, 2000 ambiguous; 2000 founders) loaded from
demo_hg19_imput.fam.
 
16496116 variants loaded from demo_hg19_imput.bim.
Note: No phenotype data present.
Calculating allele frequencies... done.
--score: 1035617 variants processed.
--score: Results written to PRS/demo_hg19_imput_prscsx_meta.sscore .
End time: Wed Apr 16 14:22:56 2025
ROC_decile.png

Alt text