version 1.0 1/3/2018
The project aims at comparing the performance of different Machine Learning Classifiers in detecting actual Pulsars from noise.
We will use an astronomical database from the High Time Resolution Universe Survey.
The data are located and described at the UCI database.
Researchers have devoted their efforts in gathering the observations and determining the best features that can be used to discriminate between actual pulsars and noise.
The choice of the features enables a high level of positive classification.
In the figure below, we tested different classifiers:
– KNN: k-Nearest Neighbors
– NN: Neural Network with two hidden layers
– GNB: Naive Bayes
– LDA: Linear Discriminant Analysis
– QDA: Quadratic Discriminant Analysis
– LR: Linear Regression
– DT: Decision Tree
– ExT: Extra Tree Classifier
– RF: Random Forrest
– Ada: AdaBoost Classifier
– GMMB: Gaussian Mixture Bayes Classifier (astroML)
– XGB: XGBoost Classifier (XGBoost package)
– SVM: Support Vector Machine
– LVQ: Linear Vector Quantization Classifier (Python routine)
– meta: an ensemble of several of the classifiers above
I used the classifiers in the scikit-learn package unless specified.
The Python code can be downloaded here.
Most of the classifiers perform quite well with surprisingly KNN, LVQ, and LDA outperforming much more sophisticated algorithms like XGBoost or the Neural Network.The completeness is close to 90% for an efficiency of 90%.
R. J. Lyon, B. W. Stappers, S. Cooper, J. M. Brooke, J. D. Knowles, Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach, Monthly Notices of the Royal Astronomical Society 459 (1), 1104-1123, DOI: 10.1093/mnras/stw656
R. J. Lyon, HTRU2, DOI: 10.6084/m9.figshare.3080389.v1.