WATCH VIDEO
The automatic detection and classification of celestial data on ingest is of growing importance as the volume and velocity of these survey images increases. Here we present a multistage data pipeline and with integrated machine learning classifier, based on the Kohonen self-organizing map (SOM). SOMs are simple to implement and they learn in and unsupervised manner. This allows a SOM Artificial Neural Networks (ANN) to be trained without the pre-classification of the training data set, rendering SOM results free from human bias. In turn, the open source data pipeline is built for absolute speed and efficiency. Data is collected, coalesced and classified based on a real-time streams framework consisting of Apache Flink and Apache Spark. Interesting radio sources are collected in a deep object store for further analysis and review, while data labeled as interference is discarded.