The application of Big Data is undoubtedly an important part of technological development in the future. However, machine learning and artificial intelligence both play an important role of unleashing the value of data. The relationship between the three is briefly explained: Big data for materials, machine learning for the method, and artificial intelligence is the result. Machine Learning means machines (computers) have the same learning capability how humans learns. Through the data, machine learning nowadays has been widely used in life, for example, self-driving driving cars and automated transportation, streamlined logistics and distribution, improving elder care and so on to make our life more convenient.
The way how machine learning works is similar to how human learns. In order for a machine (computer) to have the same learning ability as a human being, it is usually done by “classification” before it can analyze, judge, and finally take action. The types of machine learning algorithms are mainly divided into four categories: Supervised learning, Un-supervised learning, Semi-supervised learning, and Reinforcement learning.
- Supervised learning: All materials are “labeled” to tell the machine the corresponding value to make it predict the correct value. This method is mostly manual classification, which is the easiest for a computer and the hardest for humans. This method is like telling the machine (computer) standard answer. When the machine is officially tested, the machine will reply according to the standard answer, and the reliability will be greater. For example, if you would like to train a machine to distinguish between elephants and giraffes, you can provide 100 photos of elephants and giraffes. The machine detects the characteristics of elephants and giraffes according to the “labeled” photographs and identifies elephants and giraffes according to their characteristics. In the end, it will correctly predicts them.
- Un-supervised learning: No material is labeled, and the machine classifies materials itself by detecting the characteristics of the data. Manually classification is not involved in this method, which is the simplest for humans, but it is the hardest for the computer and caused more errors. If un-supervised learning is used to identify elephants and giraffes, the machine must decide which of the 100 photos provided are elephants and which are giraffes and do the classification at the same time. In the future predictions, the machine identifies which animal it is according to the characteristics and classification it detects. However, the results identified by the machine are not necessarily correct.
- Semi-supervised learning: A small amount of data are labeled. Computers only need to find features through labeled data and then classify other data accordingly. This method can make predictions more accurate and is the most commonly used method. If there are 100 photos, 10 of them which are elephants and which are giraffes are labeled. Through the characteristics of these 10 photos, the machine identifies and classifies the remaining photos. Because there is already a basis for identification, the predicted results are usually more accurate than un-supervised learning.
- Reinforcement learning: The machine uses observations gathered from the interaction with the environment to take actions that would maximize the reward or minimize the risk. Using reinforcement learning, there is no labeled materials, but tell it which step is correct and that step is wrong. According to the quality of the feedback, the machine gradually amends its classification and finally gets the correct result. In order to achieve a certain level of correctness in un-supervised learning, integration of reinforcement learning is necessary. If the machine identifies the features and classifications on its own and predicts the image of an elephant as a giraffe, the human gives the wrong message. The machine will recognize the features and classification again. Through correct and wrong learning at one time, the final prediction will become more and more accurate.
Among four types mentioned above, supervised learning is the most accurate but also cost most. But what if you want to master high accuracy under limited labor and cost? Amazon provides the services of Amazon SageMaker Ground Truth to reduce the labor costs by accelerating the building of highly accurate data sets. You can refer to the following blog post, allowing you to easily understand the features of SageMaker through hands-on lab: SageMaker Ground Truth builds a highly accurate data set.