NR-Toxpred Predict

Model Information

Dataset

This study used the well-curated NuRA chemical dataset to train machine learning models for nine nuclear receptors [23]. The dataset and the KNIME workflow [29] of data curation was downloaded from https://doi.org/10.5281/zenodo.3991561, and we carefully verified each step of the curation process. The dataset contains 15247 combined entries for nine different receptors, annotated as three binding class types 1) agonist, 2) antagonist, and 3) binders. Each type is further classified as activity type 1) active, 2) weakly active, 3) inactive, 4) inconclusive, and 5) data missing. Table 1 shows the compositions of different classes for each receptor. Missing data and inconclusive results were removed from the dataset. And then, because the number of chemicals in the weakly active category is low, we combined the active and weakly active entries into a single category in each binding class type, resulting in a binary (active vs inactive) designation for each of the agonists, antagonists and binders. Our study therefore developed machine learning models to predict each of these binding class types using a binary classification (Binding Class models).

Table 1: Number of chemicals by class for all receptors in the training and validation set.

Receptor	Class	Total Inactive	Total Active	Total Weakly Active	Training set Actives/Inactives	Validation set Actives/Inactives
	Agonist	5670	349	27	290/4546	86/1124
PR	Antagonist	4400	741	548	1027/3524	262/876
	Binder	5040	1251	53	1057/4018	247/1022
	Agonist	4549	130	133	-	-
RXR	Antagonist	3	115	1	-	-
	Binder	4569	861	145	-	-
	Agonist	5384	737	41	613/4316	165/1068
GR	Antagonist	4577	657	190	666/3673	181/904
	Binder	5228	1815	84	1537/4164	362/1064
	Agonist	5578	513	121	517 /4452	117/ 1126
AR	Antagonist	4942	776	391	926 /3961	241/ 981
	Binder	5130	1419	104	1243/ 4079	280/ 1051
	Agonist	5060	476	461	751/ 4046	186/ 1014
ERA	Antagonist	5160	362	322	544/ 4131	140/1029
	Binder	4861	1287	177	1184/ 3876	280/985
	Agonist	5744	286	48	270/4592	64/1152
ERB	Antagonist	5133	224	229	359/4109	94/1024
	Binder	5554	1159	66	998/4425	227/1129
	Agonist	5349	372	85	346/4298	111/1051
FXR	Antagonist	4829	124	143	219/3857	48/972
	Binder	5272	550	108	530/4214	128/5272
	Agonist	5663	616	73	-	-
PPARD	Antagonist	5561	28	24	-	-
	Binder	5742	730	52	-	-
	Agonist	5223	1352	158	1200/4186	310/1037
PPARG	Antagonist	5249	88	153	203/4189	38/1060
	Binder	5458	1699	205	1529/4360	375/1098

Since we are also interested in identifying active binding vs inactive chemicals (Effector models) regardless of agonist, antagonist, or undefined binding class, we additionally developed machine learning models by first merging the three binding classes and removing the inconclusive and missing data for each receptor to increase the sample size of Effector types (actives and inactive). Table 2 shows the active and inactive chemicals compositions for each receptor after merging the three binding types.

Table 2: Number of active and inactive chemicals for all receptors.

	Total		Training Set		Validation Set
Receptor	Actives	Inactives	Total	Actives/Inactives	Total	Actives/Inactives
RXR	1008	4569	4461	807/3654	1116	201/915
PR	2078	5063	5712	1646/4066	1429	432/997
GR	2143	5232	5900	1720/4180	1475	423/1052
AR	2217	5179	5916	1782/4134	1480	435/1045
ERA	2327	4956	5826	1863/3963	1457	464/993
ERB	1552	5563	5692	1228/4464	1423	324/1099
FXR	837	5276	4890	662/4228	1223	175/1048
PPARD	848	5745	5274	678/4596	1319	170/1149
PPARG	2118	5469	6069	1693/4376	1518	425/1093

Training Dataset

For each of the 9 nuclear receptors, the NR specific curated chemical datasets were randomly divided into training (80%) and test sets (20%) using the "train_test_split" function in the scikit-learn package (Table 1 and Table 2). The test set was used to give an estimate of the performance of each of the developed models. This 20% test set of chemicals were not used in the training set while developing and optimizing the performance of any of our ML models.

Molecular Features

In this investigation, we utilized molecular fingerprints for descriptor features. We employed two widely used fingerprinting methods 1) Morgan fingerprints, also called an extended-connectivity fingerprint (ECFP4), which is a circular substructure fingerprint where we chose a radius of 3 and a length of hashed binary vectors of 1024-bits and 2) Molecular ACCess System (MACCS) key fingerprints which have 166 public keys implemented as SMARTS. The Python-based RDKit [30] library was used to generate the molecular fingerprints from the SMILES data.

Machine Learning Model Development

As noted previously [31], there is no single optimal machine learning algorithm for all potential data problems. However, one can define an approach that is guaranteed to generate the best from a set of explicit, competing algorithms. In our case, we used nine different machine learning techniques, including 1) AdaBoost [32], which is a boosting algorithm that combines multiple "weak classifiers" into a single "strong classifier", 2) Logistic regression [33], which predicts the value of a categorical variable based on its relationship with predictor variables, 3) Random Forest [34], which merges a collection of independent decision trees to decrease both bias and variance, 4) Support Vector Machine (SVM) [35], which is a classifier that finds an optimal hyperplane to maximize the margin between two classes, 5) k-nearest neighbors (k-NN) algorithm [36], which assumes that similar data points exist nearby each other and makes predictions by calculating the difference between the new data point and all other data points in the training set, 6) Bagging classifier [37], which is an ensemble-based model that fits base classifiers on random subsets of the original dataset and then aggregates their predictions to generate a final prediction, 7) Gaussian Naïve Bayes [38], which is a variant of Naive Bayes algorithm based on Bayes theorem, 8) decision tree classifier algorithm [39], which uses a tree where each node represents a feature, each branch represents the decision and each leaf represents an outcome, and 9) Super learner [31], which combines the predictive probabilities of NR binding across many ML algorithms and finds the optimal combination of the collection of algorithms by minimizing the cross-validated risk. This approach is an improvement over methods using only one ML algorithm because no one algorithm is universally optimal. Super learner has been shown in theory to be at least as good as the best performing algorithm in the ensemble and often performs considerably better than the component machine learning models. For each of these methods, we used a grid-search cross-validation (GridSearchCV) method as implemented in scikit-learn [40] to tune the hyperparameters.

Repeated k-Fold Cross-Validation

We assessed the performance of the classification models using stratified k-fold cross-validation. The stratified-folds function was utilized to split the data while keeping the correct ratio of different classes. We evaluated the classification performance for each receptor by repeated stratified k-fold cross-validation with ten splits and 100 repeats, in total 1000-fold.

Applicability Domain

Applicability domain is defined as described by Chen et al. [44] and was measured by the similarity to the molecules in the training set. Tanimoto similarity was calculated using ECFP4 fingerprints and MACCS key fingerprints for the respective feature space. The test molecule is considered to be within the applicability domain if the number of chemicals (N_min(default =1)) with similarity is greater than the cutoff (S_cutoff (default=0.25)) in the training dataset. The applicability domain was defined as a combination of S_cutoffand N_min.

Models for AR

Binding Class Models for AR

Agonist, antagonist, and binder datasets were used to build three different machine learning models for AR. Prediction accuracy for different types and algorithms on cross-validation with ECFP4 and MACCS key fingerprints is given in Tables 3 and 4, respectively. The algorithms on the agonist and binder dataset have achieved a cross-validation prediction accuracy of >90%. Best accuracy was obtained for both super learner and SVM based models: 87% on the validation set with ECFP4 fingerprints (Table 5). With the MACCS key fingerprints, the best accuracy was obtained for super learner (Table 6) for the agonist. For the binder dataset, both SVM and super learner had similar performance measures with 97% and 96% accuracy on the validation set for ECFP4 and MACCS key fingerprints. For the agonist dataset, the precision-recall AUC (PR AUC) values of validation for super learner and SVM are 0.81 and 0.80 (Table 5), respectively, for ECFP4 fingerprints and 0.81 and 0.79 for MACCS key fingerprints (Table 6). The validation dataset's PR-AUC value is 0.98 and 0.97 for ECFP4 and MACCS key fingerprints for the binder dataset. We applied the applicability domain to the validation set and removed the unreliable data points that were thus identified. Then we evaluated the performance of the SVM and super learner models on the remaining reliable data points from validation dataset.

AdaBoost classifier, bagging classifier, decision tree classifier, k-NN, random forest, super learner, SVM models have achieved a prediction accuracy of >85% for antagonist model with both ECFP4 and MACCS key fingerprints as a feature. On the validation set with ECFP4 fingerprints, super learner and SVM based models achieved 83% and 84% accuracy, respectively (Table 5). Similar balanced accuracy was obtained for super learner and SVM models with MACCS key fingerprints (Table 6). The PR-AUC values on the validation set for super learner and SVM are 0.81 and 0.80 (Table 5), respectively, for ECFP4 fingerprints and 0.81 and 0.79 for MACCS key fingerprints (Table 6). The developed models performance is comparable to other developed models [11, 28].

Effector Models for AR

For AR, four algorithms: k-NN, random forest, SVM and super learner with ECFP4 fingerprints all exhibited high predictive power. The balanced accuracy values are 85%, 86%, 87% and 86%, respectively, with MCC scores of 0.77, 0.73, 0.78 and 0.89, respectively, on the validation dataset (Table 7). The accuracy scores on the k-fold CV for these three models are 0.90±0.01, 0.88±0.01, 0.89±0.01 and 0.90±0.01 (Table 8). The effector AR model has achieved a prediction accuracy of 90% on cross-validation for SVM and k-NN and 89% for super learner. Although k-NN and SVM achieved higher accuracy with MACCS key fingerprints, SVM with ECFP4 fingerprints performed best with a higher MCC value, which produced a more informative and truthful score in evaluating binary classifications [47].

Table 3: Average accuracy of different algorithms for three class approach for six receptors using ECFP4 fingerprints as input features on the repeated K-fold cross-validation.

		Agonist	Antagonist	Binder
Receptor	Algorithm	Accuracy	Accuracy	Accuracy
AR	Super learner	0.95±0.01	0.87±0.02	0.98±0.01
	Support vector machine	0.96±0.01	0.88±0.01	0.98±0.01

ERA	Super learner	0.84±0.02	0.86±0.02	0.94±0.01
	Support vector machine	0.84±0.01	0.88±0.01	0.94±0.01

ERB	Super learner	0.96±0.01	0.86±0.02	0.98±0.01
	Support vector machine	0.96±0.01	0.87±0.01	0.98±0.01

FXR	Super learner	0.98±0.01	0.88±0.02	0.97±0.01
	Support vector machine	0.97±0.01	0.89±0.02	0.97±0.01

GR	Super learner	0.98±0.01	0.93±0.01	0.98±0.01
	Support vector machine	0.98±0.01	0.93±0.01	0.98±0.01

PR	Super learner	0.98±0.01	0.86±0.02	0.99±0.00
	Support vector machine	0.98±0.01	0.87±0.02	0.99±0.00

± = standard deviations

Table 4: Average accuracy of different algorithms for three class approach for six receptors using MACCS key fingerprints as input features on the repeated K-fold cross-validation.

Receptor		Agonist	Antagonist	Binders
	Algorithm	Accuracy	Accuracy	Accuracy
AR	Super learner	0.95±0.01	0.87±0.02	0.98±0.01
	Support vector machine	0.96±0.01	0.89±0.01	0.97±0.01

ERA	Super learner	0.83±0.02	0.83±0.02	0.95±0.01
	Support vector machine	0.85±0.01	0.91±0.01	0.95±0.01
ERB	Super learner	0.96±0.01	0.85±0.02	0.97±0.01
	Support vector machine	0.98±0.01	0.84±0.02	0.98±0.01
FXR	Super learner	0.96±0.01	0.81±0.04	0.96±0.01
	Support vector machine	0.98±0.01	0.78±0.02	0.97±0.01
GR	Super learner	0.85±0.02	0.83±0.02	0.86±0.01
	Support vector machine	0.98±0.01	0.94±0.01	0.97±0.01
PPARG	Super learner	0.86±0.01	0.92±0.01	0.84±0.01
	Support vector machine	0.96±0.01	0.82±0.02	0.96±0.01
PR	Super learner	0.98±0.01	0.86±0.02	0.98±0.01
	Support vector machine	0.98±0.01	0.88±0.01	0.98±0.01

± = standard deviations

Table 5: Comparison of the Performance of the Different Classifiers on the validation set for three class approaches for AR using ECFP4 fingerprints as input features.

	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FP	FN
Agonist	Super learner	0.87	0.77	0.98	0.75	0.81	90	1099	27	27
	Support vector machine	0.87	0.77	0.97	0.74	0.80	90	1097	27	29


Antagonist	Super learner	0.83	0.72	0.94	0.67	0.83	173	922	68	59
	Support vector machine	0.84	0.76	0.92	0.66	0.84	183	901	58	80

Binder	Super learner	0.97	0.95	0.99	0.95	0.98	265	1043	15	8
	Support vector machine	0.97	0.94	0.99	0.94	0.98	264	1042	16	9

*BA - balanced accuracy, Sn – Sensitivity, Sp – Specificity, MCC – Mathew Correlation coefficient, PR AUC – Precision-Recall Curve, TP – True Positive, TN – True Negative, FN – False Negative, FP – False Positive

Table 6: Comparison of the Performance of the Different Classifiers on the validation set for three class approaches for AR using MACCS key fingerprints as input features.

	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
Agonist	Super learner	0.87	0.79	0.96	0.71	0.81	92	1084	25	42
	Support vector machine	0.86	0.79	0.94	0.64	0.79	92	1060	25	66
Antagonist
	Super learner	0.84	0.75	0.93	0.66	0.82	180	910	61	71
	Support vector machine	0.84	0.80	0.89	0.64	0.83	192	874	49	107
Binder
	Super learner	0.96	0.94	0.98	0.92	0.97	264	1033	16	18
	Support vector machine	0.96	0.94	0.98	0.92	0.97	262	1034	18	17

Table 7: Comparison of the Performance of the Different Classifiers on the validation set for AR Effector dataset.

Fingerprint	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
ECFP4	Super learner	0.86	0.76	0.96	0.75	0.89	332	1000	103	45
	Support vector machine	0.87	0.76	0.98	0.78	0.90	329	1019	106	26

MACSS	Super learner	0.86	0.79	0.93	0.73	0.89	345	969	90	76
	Support vector machine	0.87	0.79	0.94	0.75	0.90	345	982	90	63

Table 8: Average accuracy of different algorithms for effector dataset of different receptors using ECFP4 fingerprints and MACCS key fingerprints as input feature on the repeated K-fold cross-validation.

Fingerprint		Accuracy
	Method	AR	ERA	ERB	FXR	GR	PPARD	PPARG	PR	RXR
ECFP4	Super learner	0.89±0.01	0.85±0.01	0.94±0.01	0.94±0.01	0.95±0.01	0.98±0.01	0.94±0.01	0.89±0.01	0.96±0.01
	Support vector machine	0.90±0.01	0.86±0.01	0.94±0.01	0.95±0.01	0.95±0.01	0.98±0.01	0.94±0.01	0.90±0.01	0.96±0.01
MACCS
	Super learner	0.88±0.01	0.84±0.01	0.93±0.01	0.92±0.01	0.94±0.01	0.97±0.01	0.93±0.01	0.89±0.01	0.96±0.01
	Support vector machine	0.89±0.01	0.85±0.01	0.93±0.01	0.94±0.01	0.95±0.01	0.98±0.01	0.94±0.01	0.90±0.01	0.96±0.01

± = standard deviations

Models for ERA and ERB

Binding Class Models for ERA and ERB

Machine learning models of agonist, antagonist, and binder of both ERA and ERB were evaluated using an validation dataset and repeated k-fold CV. The performance measures for different algorithms with the validation test set and repeated k-fold CV are given in Tables 3 and 4 for ECFP4 and MACCS key fingerprints as input features, respectively. The bagging classifier has an average accuracy of 89%, 91% and 94% for agonist, antagonist, and binder datasets with ECFP4 fingerprints and 88%, 91% and 93% with MACCS key fingerprints, respectively, for ERA. The performance measure of ERA and ERB using the binding class classifier on the validation set are given in supporting information Tables S3 and S4, respectively, for ECFP4 fingerprints as input feature and Tables S5 and S6, respectively, for MACCS key fingerprints. Even though the bagging classifier has better accuracy on CV, the SVM and super learner appear to give a more consistent prediction accuracy on both CV and the validation dataset (Table S3 and S5). Similarly, for ERB, more consistent performance measures were obtained for SVM and super learner (see Tables S4 and S6).

Effector Models for ERA and ERB

The SVM model performed best (balanced accuracy, 80%; MCC score of 0.66), followed by random forest (accuracy, 79%; MCC score 0.61) for ECFP4 fingerprints as descriptors on the validation dataset of ERA (Table S7). For MACCS key fingerprints, SVM had comparable accuracy but a lower MCC (Table S7). The lower MCC was likely due to the promiscuous nature of ERA, which binds to diverse chemicals, which in turn made it somewhat harder for machine learning algorithms to discriminate between NR-binding and non-binding chemicals. For ERB, the accuracy score on the k-fold CV is 85% and 86% for super learner and SVM for ECFP4 fingerprints (Table S8) and 84% and 85% for MACCS key fingerprints (Table S8). The model developed using SVM combined with ECFP4 fingerprints had a maximum MCC value of 0.82 with the specificity, sensitivity, and balanced accuracy of 94%, 94% and 89%, respectively. Similar performance has been observed for other classifiers with ECFP4 and MACCS key fingerprints.

Models for FXR and PPARG

Binding Class Models for FXR and PPARG

Classifiers based on the ECFP4 and MACCS key fingerprints average stratified k-fold CV accuracy for different classes of FXR and PPARG are given in Table 3 and 4, respectively, demonstrating that all of the classifiers have achieved accuracies of >90% at identifying FXR agonist and binders. Specifically, bagging classifier, k-nearest neighbors, random forest, super learner and SVM classifiers achieved an accuracy of >95% at identifying FXR agonist and binders with MACCS key fingerprints. For the FXR antagonist dataset, AdaBoost classifier, bagging classifier, decision tree classifier, and random forest all have accuracies of > 90% with k-fold CV with ECFP4 and MACCS key. The performance of different classifiers for different classes of FXR and PPARG on the validation dataset are given in Tables S9 (ECFP4 fingerprints), S10 (MACCS key) and Tables S11 (ECFP4 fingerprints), S12 (MACCS Key), respectively. The results demonstrate that super learner has attained better performance for agonists and binders of FXR with different fingerprints. Similar performance has been achieved for PPARG agonists and binders. Poor performance of antagonist models was obtained on the test set for all the classifiers for both FXR and PPARG due to the sample size of the training dataset.

Models for GR and PR

Binding Class Models for GR and PR

Classifiers based on the ECFP4 and MACCS key fingerprints average stratified k-fold CV accuracy for different classes of GR and PR are given in Tables 3 and 4, respectively. Results show that SVM and super learner algorithms have higher accuracy in identifying agonists and binders for GR and PR based on k-fold CV. The performance of different classifiers for different classes of GR and PR on the validation dataset are given in Tables S13 (ECFP4 fingerprints), S14 (MACCS key) and Tables S15 (ECFP4 fingerprints), S16 (MACCS Key), respectively. Results show that random forest, super learner and SVM have good performance scores for the three classes of GR and PR with different features.

Effector Models for FXR, GR and PR, PPARG, PPARD and RXR

Data availability for antagonists of PPARD and RXR is limited; hence we have not modelled the different classes. We merged the dataset as described in the materials and methods to create an effector dataset for these receptors. Performance measures on the repeated k-fold CV for FXR, GR, PR, PPARG, PPARD and RXR are given in Tables 3 and 4 for ECFP4 and MACCS key fingerprints, respectively. Results show high accuracy across these NRs for all classifiers with both fingerprint types. The different performance measures on the dataset for FXR, GR, PR, PPARD, PPARG, and RXR are given in supporting information Tables S17 to S22, respectively. Tables 3 and 4 show that super learner and SVM have both attained accuracies of >90% for the effector dataset of these receptors. The supporting information Tables S17 and S19 for FXR and PR show that most of the classifiers attained high accuracy for both fingerprint types. The Random Forest, k-nearest neighbors and support vector machine with ECFP4 fingerprints showed similar sensitivity/specificity 94 - 95% / 94 - 95%, respectively, with MCC value 0.75 - 0.76 on the validation dataset. The results for the ligand binding predictions for GR, PPARD, PPARG, and RXR (see supporting information Table S18, S20, S21, and S22) show that the support vector machine-based models achieved slightly higher accuracy and MCC score than other evaluated algorithms.

Applicability Domain on the Validation set

The results on the validation dataset after filtering the dataset through the applicability domain for the reliability of the prediction are given in supporting information as a CSV file supporting information (S23 to S81). The results show that including the applicability domain with SVM and super learner models with ECFP4 fingerprints improves the model's performance. The stringent S_cutoff= 0.6 and N_min >= 5 reduces the number of chemicals within the applicability domain and gives the best prediction outcomes. Significant improvement in the performance of the antagonist models of FXR has been obtained using the strict applicability domain parameters.

Implementation of Web Server

Based on our trained and validated best performing models, we have developed a web-based application named NR-ToxPred with a user-friendly interface to assist the scientific community (Figure 1). The user interface of the NR- ToxPred allows for different formats to submit small molecules. Users can sketch the structure using a simple drawing interface, give SMILES codes as text input in the drawing interface, or input CAS ID data as the search criteria. Users can upload a two-column file with SMILES codes and corresponding names in a comma-separated CSV format for multiple ligand predictions. We implemented the best support vector machine-based model for all nine NRs on the webserver. For the single structure input, in addition to the tabulated results for each receptor, if the chemical is a predicted ligand, it is subsequently docked to the matching receptor(s). Users can select the Applicability domain criteria (S_cutoff and N_min). The NR-ToxPred web service can be accessed at http://nr-toxpred.cchem.berkeley.edu/.

Limitations of the models

In this study, we developed different machine learning models for predicting agonist, antagonist, binders (each binding class as binary: active vs inactive) and also effectors (binding vs not binding). Then, as needed, we constrained these to the applicability domain within each receptor according to the available number of chemicals in each class in the dataset. We initially found poor predictive power of for the antagonist models of FXR but this was overcome by setting stricter criteria for the applicability domain. For PPARG, PPARD and RXR models, we collapsed the agonist and antagonist from the dataset into one category, aka effector, due to the limitations in the available number of chemicals in each antagonist category in the dataset. The models herein are thus limited to predicting only the binding of the small molecules to these NRs. They are not capable of distinguishing agonists versus antagonists. However, this distinction is easily determined in an experimental setting once the binding candidates are identified. This experimental testing is much more tractable with the computationally shortlisted data set than testing the whole set of chemicals. For the other NRs, our predictions are well validated and robust with more robust data.

Table S3: Comparison of the performance of the different classifiers on the validation set for binding type ERA – ECFP4 fingerprints as input features.

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.75	0.57	0.93	0.52	0.66	106	948	80	66
	Support vector machine	0.77	0.65	0.89	0.49	0.64	120	903	66	111
Antagonist
	Superlearner	0.75	0.58	0.92	0.47	0.63	81	948	59	81
	Support vector machine	0.74	0.57	0.91	0.44	0.60	80	937	60	92
Binary
	Superlearner	0.94	0.91	0.97	0.86	0.95	254	951	26	34
	Support vector machine	0.93	0.90	0.96	0.85	0.95	252	945	28	40

Table S4: Comparison of the performance of the different classifiers on the validation set for binding type ERB – ECFP4 fingerprints as input features.

	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
Agonist	Support vector machine	0.94	0.91	0.97	0.71	0.84	58	1112	6	40
	Superlearner	0.95	0.92	0.97	0.77	0.92	59	1123	5	29
Antagonist	Support vector machine	0.77	0.66	0.89	0.42	0.58	62	911	32	113
	Superlearner	0.77	0.62	0.92	0.45	0.61	58	940	36	84
Binary
	Superlearner	0.99	0.99	0.99	0.95	1.00	224	1114	3	15
	Support vector machine	0.99	0.99	0.98	0.95	0.99	225	1111	2	18

Table S5: Comparison of the performance of the different classifiers on the validation set for binding type ERA – MACCS key as input features

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.74	0.59	0.90	0.47	0.62	109	915	77	99
	Support vector machine	0.74	0.61	0.87	0.43	0.60	114	878	72	136
Antagonist
	Superlearner	0.75	0.60	0.91	0.45	0.62	84	932	56	97
	Support vector machine	0.77	0.66	0.88	0.46	0.60	93	910	47	119
Binder
	Superlearner	0.93	0.91	0.96	0.85	0.94	255	942	25	43
	Support vector machine	0.93	0.91	0.95	0.84	0.95	256	936	24	49

Table S6: Comparison of the performance of the different classifiers on the validation set for binding type ERB – MACCS key fingerprints as input features

	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
Agonist	Support vector machine	0.93	0.91	0.96	0.68	0.78	58	1103	6	49
	Superlearner	0.94	0.92	0.96	0.69	0.90	59	1102	5	50
Antagonist
	Superlearner	0.80	0.68	0.91	0.48	0.66	64	936	30	88
	Support vector machine	0.85	0.85	0.86	0.49	0.54	80	879	14	145
Binders
	Support vector machine	0.98	0.98	0.98	0.93	0.99	223	1104	4	25
	Superlearner	0.98	0.98	0.98	0.93	0.98	223	1105	4	24

Table S7: Comparison of the performance of the different classifiers on the validation set for ERA.

ECFP4 Fingerprints	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.80	0.65	0.94	0.64	0.83	300	937	164	56
	Support vector machine	0.80	0.63	0.96	0.66	0.83	293	954	171	39
MACCS Fingerprints
	Superlearner	0.79	0.68	0.91	0.61	0.73	314	904	150	89
	Support vector machine	0.80	0.69	0.91	0.62	0.82	321	903	143	90

Table S8: Comparison of the performance of the different classifiers on the validation set for ERB.

Fingerprint	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
ECFP4	Superlearner	0.88	0.79	0.98	0.81	0.89	256	1074	68	25
	Support vector machine	0.89	0.79	0.98	0.82	0.90	256	1082	68	17
MACCS
	Superlearner	0.88	0.81	0.96	0.79	0.89	261	1059	63	40
	Support vector machine	0.88	0.79	0.97	0.78	0.88	255	1061	69	38

Table S9: Comparison of the performance of the different classifiers on the validation set for binding type FXR – ECFP4 fingerprints as input features

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.92	0.86	0.99	0.84	0.91	95	1036	16	15
	Support vector machine	0.90	0.82	0.99	0.82	0.89	91	1036	20	15
Antagonist
	Superlearner	0.75	0.58	0.92	0.36	0.49	28	899	20	73
	Support vector machine	0.74	0.56	0.91	0.31	0.39	27	883	21	89
Binder
	Support vector machine	0.94	0.90	0.99	0.89	0.91	115	1047	13	11
	Superlearner	0.94	0.89	0.99	0.89	0.92	114	1046	14	12

Table S10: Comparison of the performance of the different classifiers on the validation set for binding type FXR – MACCS

	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
Agonist	Superlearner	0.91	0.84	0.98	0.80	0.88	93	1027	18	24
	Support vector machine	0.89	0.80	0.98	0.80	0.83	89	1034	22	17
Antagonist
	Superlearner	0.94	0.89	0.98	0.86	0.87	114	1040	14	18
	Support vector machine	0.94	0.89	0.98	0.85	0.91	114	1037	14	21
Binder
	Support vector machine	0.79	0.71	0.88	0.35	0.43	34	856	14	116
	Superlearner	0.76	0.58	0.95	0.42	0.51	28	920	20	52

Table S11: Comparison of the performance of the different classifiers on the validation set for binding type PPARG – ECFP4 fingerprints as input features

	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
Agonist	Superlearner	0.94	0.89	0.98	0.89	0.95	276	1021	34	16
	Support vector machine	0.94	0.88	0.99	0.90	0.95	273	1026	37	11
Antagonist
	Superlearner	0.67	0.50	0.84	0.16	0.17	19	889	19	171
	Support vector machine	0.65	0.55	0.74	0.12	0.14	21	783	17	277
Binders
	Superlearner	0.93	0.89	0.98	0.88	0.95	332	1077	43	21
	Support vector machine	0.93	0.87	0.99	0.89	0.95	328	1083	47	15

Table S12: Comparison of the performance of the different classifiers on the validation set for binding type PPARG – MACCS fingerprints as input features

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.93	0.88	0.98	0.87	0.94	272	1012	38	25
	Support vector machine	0.92	0.86	0.98	0.86	0.94	266	1017	44	20
Antagonist
	Superlearner	0.69	0.55	0.83	0.18	0.19	21	882	17	178
	Support vector machine	0.70	0.61	0.80	0.18	0.11	23	843	15	217
Binders
	Superlearner	0.93	0.88	0.98	0.88	0.94	330	1077	45	21
	Support vector machine	0.93	0.88	0.99	0.89	0.95	329	1083	46	15

Table S13: Comparison of the performance of the different classifiers on the validation set for binding type GR – ECFP4 fingerprints as input features

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FP	FN
	Superlearner	0.95	0.91	0.99	0.92	0.95	150	1061	15	7
	Support vector machine	0.94	0.90	0.99	0.89	0.94	149	1054	16	14
Antagonist
	Superlearner	0.89	0.81	0.96	0.77	0.89	146	871	35	33
	Support vector machine	0.88	0.80	0.95	0.74	0.87	144	863	37	41
Binder
	Superlearner	0.97	0.94	0.99	0.95	0.98	342	1055	20	9
	Support vector machine	0.97	0.94	1.00	0.96	0.98	342	1061	20	3

Table S14: Comparison of the performance of the different classifiers on the validation set for binding type GR – MACCS key fingerprints as input features

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.95	0.92	0.98	0.89	0.95	151	1050	14	18
	Support vector machine	0.95	0.92	0.99	0.91	0.94	151	1056	14	12
Antagonist
	Superlearner	0.89	0.83	0.95	0.77	0.89	151	862	30	42
	Support vector machine	0.89	0.83	0.95	0.75	0.88	151	855	30	49
Binders
	Superlearner	0.97	0.95	0.99	0.94	0.97	343	1052	19	12
	Support vector machine	0.97	0.94	0.99	0.95	0.98	342	1056	20	8

Table S15: Comparison of the performance of the different classifiers on the validation set for binding type PR – ECFP4 fingerprints as input features

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.96	0.93	0.99	0.90	0.93	80	1114	6	10
	Support vector machine	0.96	0.93	0.99	0.87	0.93	80	1108	6	16
Antagonist
	Superlearner	0.83	0.76	0.91	0.65	0.83	199	795	63	81
	Support vector machine	0.83	0.78	0.88	0.63	0.83	204	774	58	102
Binder
	Superlearner	0.98	0.96	0.99	0.95	0.97	238	1012	9	10
	Support vector machine	0.98	0.97	0.99	0.96	0.99	240	1012	7	10

Table S16: Comparison of the performance of the different classifiers on the validation set for binding type PR - MACCS fingerprints as input features

Agonist	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.96	0.93	0.98	0.87	0.92	80	1107	6	17
	Support vector machine	0.96	0.93	0.98	0.85	0.87	80	1103	6	21
Antagonist
	Superlearner	0.85	0.80	0.91	0.68	0.82	210	795	52	81
	Support vector machine	0.86	0.84	0.87	0.66	0.82	221	765	41	111
Binders
	Superlearner	0.98	0.97	0.99	0.95	0.99	239	1010	8	12
	Support vector machine	0.97	0.96	0.99	0.95	0.99	237	1011	10	11

Table S17: Comparison of the performance of the different classifiers on the validation set for FXR effector dataset.

Fingerprints	Algorithm	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
ECFP4	Superlearner	0.86	0.74	0.97	0.75	0.85	129	1021	46	27
	Support vector machine	0.83	0.67	0.99	0.76	0.85	117	1039	58	9
MACCS
	Superlearner	0.85	0.75	0.95	0.68	0.82	131	995	44	53
	Support vector machine	0.84	0.71	0.97	0.71	0.83	125	1015	50	33

Table S18: Comparison of the performance of the different classifiers on the validation set for GR effector dataset.

ECFP4 Fingerprint	Method	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.94	0.9	0.98	0.9	0.95	379	1036	44	16
	Support vector machine	0.94	0.89	0.98	0.9	0.96	377	1036	46	16
MACCS Fingerprint
	Support vector machine	0.94	0.91	0.98	0.90	0.96	383	1030	40	22
	Superlearner	0.95	0.91	0.98	0.90	0.95	387	1030	36	22

Table S19: Comparison of the performance of the different classifiers on the validation set for PR effector dataset

ECFP4 Fingerprint	Algorithm	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.87	0.77	0.96	0.77	0.90	334	957	98	40
	Support vector machine	0.86	0.75	0.97	0.76	0.91	325	963	107	34
MACCS Fingerprints
	Superlearner	0.88	0.81	0.94	0.77	0.92	352	939	80	58
	Support vector machine	0.87	0.80	0.95	0.76	0.88	345	944	87	53

Table S20: Comparison of the performance of the different classifiers on the validation set for PPARD effector dataset

ECFP4 Fingerprint	Algorithm	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.94	0.88	0.99	0.91	0.95	149	1143	21	6
	Support vector machine	0.93	0.85	1.00	0.91	0.92	145	1149	25	0
MACCS
	Superlearner	0.93	0.86	0.99	0.87	0.93	147	1133	23	16
	Support vector machine	0.92	0.84	0.99	0.87	0.92	143	1138	27	11

Table S21: Comparison of the performance of the different classifiers on the validation set for PPARG effector dataset

ECFP4 Fingerprint	Algorithm	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.91	0.84	0.98	0.85	0.93	355	1071	70	22
	Support vector machine	0.90	0.80	0.99	0.85	0.92	340	1086	85	7
MACCS Fingerprint
	Superlearner	0.90	0.84	0.97	0.83	0.87	355	1063	70	30
	Support vector machine	0.90	0.83	0.98	0.83	0.94	352	1066	73	27

Table S22: Comparison of the performance of the different classifiers on the validation set for RXR effector dataset

ECFP4 Fingerprint	Algorithm	BA	Sn	Sp	MCC	PR AUC	TP	TN	FN	FP
	Superlearner	0.90	0.80	1.00	0.87	0.92	160	913	41	2
	Support vector machine	0.90	0.80	1.00	0.87	0.89	161	913	40	2
MACCS
	Superlearner	0.90	0.81	0.99	0.85	0.91	162	906	39	9
	Support vector machine	0.91	0.82	1.00	0.87	0.91	164	911	37	4