Astronomy is probably the most ancient data science. And as in many other fields, the amount of data in astronomy has grown exponentially in the past decade. To efficiently extract scientific information from the ever-growing unstructured data, I have been developing new machine-learning techniques in dimensionality reduction, matrix factorization, classification, clustering analysis etc. I love playing with data and enjoy developing new algorithms. Here are just a few projects I have been working on most recently.
I'm slowly migrating to git (from SVN/Mercury): https://github.com/guangtunbenzhu/
My old codes were mostly in IDL and Fortran/C. I have transitioned to Python. Here are a few Python packages I just released on PyPI (You can install it through pip).
Nonnegative Matrix Factorization (NonnegMFPy): https://github.com/guangtunbenzhu/NonnegMFPy
Set Cover Problem Solver (SetCoverPy): https://github.com/guangtunbenzhu/SetCoverPy
Set Cover Problem (P =?= NP)
The set cover problem (SCP) is one of the open problems in operations research. It is one of well-known NP-complete problems and has many real-life applications, such as crew-assigning for trains and airlines, fire station/school-location selection, nurse-scheduling etc. As an astronomical application of SCP, I have developed a new Archetype technique for classification purposes.
Nonnegative Matrix Factorization (and PCA)
Dimensionality reduction and matrix factorization techniques have many applications in physics and astronomy. In astronomy, a particular useful technique is nonnegative matrix factorization, as the flux of an astronomical source does not go negative.
One of the most exciting developments in computer science/machine learning recently has been the new algorithms that made deep learning possible. I have started to explore the potential of deep learning in astronomy. This section will be updated soon.
Frequentist vs. Bayesian
The frequentist approach focuses on the likelihood P(D|M) and intends to find the model that describes the data the best, while the Bayesian approach focuses on the posterior distribution P(M|D) and considers all possibilities.