Data Science

Having an astrophysicist background, I will try to post routines and tutorials using astrophysics datasets.

Principal Component Analysis
PCA is used in data dimensionality reduction but I show here that it can also be used to improve classification accuracy and to generate new data. (dimensionality reduction, classification)

A simple Python implementation of the Learning vector quantization method for classification.
Learning vector quantization (LVQ) is a prototype-based supervised classification algorithm proposed by T. Kohonen, which is currently in scikit-learn (version 0.19). The routine has an option to run a variant of the Neural Gas algorithm. (supervised learning, classification)

Centaurs and Trans-Neptunian Objects (TNOs):
A single family of objects? An Ipython notebook analysis.

Identifying Pulsars from noise in large radio surveys
A survey of machine learning classification algorithms efficiency in detecting real astronomical objects like pulsars from noise. (imbalanced dataset, classification)

Identifying candidate Hyades cluster members using Hipparcos satellite data.
We will survey different unsupervised clustering and outlier detection algorithm to analyze Hipparcos photometric, parallaxes, and proper motion data. The candidate members will be compared to the findings of Perryman et al. (Hertzsprung-Russell diagram, clustering, outlier detection)

Some Star Formation Statistics.
We will analyze contingency tables.

Galaxy clustering in the Shapley field. (clustering)

Python classes to perform Linear Regression, Binary and Multiclass classification with the Orthogonal Distance Regression algorithm.
The classes are compatible with scikit-learn API and therefore can be used with tools from the scikit-learn package.

Exploring exoplanet catalogues.
Can we teach machine learning alogrithms to recognise habitable exoplanets? (Exploratory Data Analysis, classification, imbalanced dataset)

Photometric redshift determination.
Machin learning regressors can be used to make estimates of galaxy photometric redshifts (Linear regression, K-Nearrest Neighbors, XGBoost, Neural Networks, error propagation, Ensemble regression)

Combo 17 survey. The colors are used to find groups. (Corelation matrix, clustering)

Short term temperature forecast on the top of the Mauna Kea volcano where many world-class telescopes are located. (Weather forecast, time-series, Facbook prophet package).

‎Comparison of Astropy table and Pandas, part 1