Published

Basic Concepts of Artificial Intelligence (AI), Machine Learning, and Deep Learning

Published
Gaudhiwaa Hendrasto

Behind the advanced technology of autonomous cars in Tesla, ChatGPT, and Bing Image Creator, artificial intelligence plays a central role in the development of these products.

Artificial intelligence, often referred to as AI, is the ability of machines to mimic or simulate human cognitive intelligence, including capabilities such as perception, reasoning, learning, interaction, problem-solving, and even creativity (McKinsey, 2023).

In the past decade, AI breakthroughs have been increasingly prominent and have become a topic of discussion worldwide. "From Chatbot to Cyborg" is a fitting term to represent the potential benefits and pitfalls of AI. Questions such as, "Is AI harmful to humans? Can AI really replace human jobs? Seriously?" 🧐

To understand this discourse, we need to comprehend how AI works and the scope of its applications. In this article, I will assist you in delving into the knowledge about AI by addressing questions like, "What is AI? How does it work? How is it created?" 🤔

Get into a comfortable sitting position—lying down is fine too 😉—because we are going to explore AI comprehensively! 🥂

Author's Note:

If you are interested in learning AI and are still at an early stage, I recommend understanding the following core learnings:

  1. Differences between AI, Machine Learning (ML), and Deep Learning (DL).
  2. Machine Learning Tasks: Regression, Classification, and Clustering.
  3. Machine Learning Approaches: Supervised, Unsupervised, and Reinforcement Learning.

Other topics such as algorithms, evaluation, development stages, etc., can be learned gradually.

Table of Content

Scope of AI

ai-ml-dl

The image above represents the scope of AI, within which there are machine learning and deep learning. Often, what we refer to as AI specifically consists of machine learning or deep learning. All three are interconnected but are distinct. The diagram below explains the differences between them:

Artificial Intelligence (AI):

Definition: A branch of computer science that aims to provide intelligence to machines to perform tasks that require human expertise.

Instructions: Requires explicit instructions to perform specific tasks.

Algorithms: Fuzzy, Decision Tree, Genetic Algorithm.

Uses: Game bots, robotics.

Machine Learning (ML):

Definition: A sub-discipline of AI that uses algorithms to enable machines to learn from data without the need for explicit programming.

Instructions: Learns from data without explicit instructions.

Algorithms: Linear Regression, Logistic Regression, Polynomial Regression, Random Forest, Support Vector Machines (SVM), K-Nearest Neighbors (KNN).

Uses: Prediction, classification, clustering, recommendations.

Deep Learning (DL):

Definition: A sub-discipline of ML that uses artificial neural network (ANN) architectures to model and understand complex data representations.

Instructions: Uses deep learning and specialized hardware to extract complex features from data.

Algorithms: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Generative Adversarial Networks (GAN).

Uses: Image recognition, speech recognition, machine translation, Natural Language Processing (NLP).

In other words, we can say that there is AI that is not machine learning, and there is machine learning that is not deep learning.

With a bit of understanding of the differences between AI, machine learning, and deep learning, we already know enough about the scope of AI. It's time to dive into the somewhat technical part! 🥂

A. Artificial Intelligence

Above is the paradigm of how AI works, with the following explanations:

ConceptDescription
AgentAI object
EnvironmentThe environment of the agent
ActionAction taken on the environment
StateCurrent condition or situation
RewardPositive feedback for the action taken by the agent

By understanding these concepts, we can more easily create AI programs. The working process is as follows:

The agent will perform actions with the greatest reward based on the current state.

To better understand, I will use a real-world application in a chess bot. The main goal of this chess bot is to try as much as possible to win the game by threatening the king (checkmate).

The concept of this chess bot is described as follows:

ConceptDescription
AgentWhite chess piece
EnvironmentChessboard (8x8), position of white and black pieces
ActionMove one of the white chess pieces (pawn, rook, knight, queen, etc.)
StatePosition of white and black pieces
RewardTotal points including: material advantage of pieces + position quality + threat to the king + king safety

If implemented in the form of one algorithm, namely the decision tree, it looks like the image below. The trees represent possible actions based on the current state. White-colored trees indicate actions taken by white pieces, while black-colored trees indicate actions taken by black pieces. At the very end of these trees, there is a reward in the form of a number, which will be the consideration for the chosen action. The action with the highest reward will be chosen by the agent.

At a glance, does this resemble the IF-ELSE concept, right? Yes, in AI (excluding ML), programming code still needs to be written explicitly because computers cannot learn automatically from the provided data yet.

Additional Note: If we think further, the possibilities of actions that the chess bot can take are countless. Therefore, in many cases, the creator uses the parameter "depth" (depth of the decision tree) to anticipate decisions that are too deep and never-ending. That's why, when playing chess with a bot at a "hard" difficulty level, the bot will take longer to determine the best move.

B. Machine Learning

Machine learning was defined in 1950 by AI pioneer Arthur Samuel as a field of study that gives computers the ability to learn without being explicitly programmed. In this context, unlike AI, computers can learn through provided data using an algorithm.

Machine Learning: Tasks

Generally, machine learning tasks are divided into three: regression, classification, and clustering. This concept is essential as a representation of data and the main tasks of machine learning.

What's the difference between them? Check out the following explanations:

1. Regression

Definition: Finding relationships or patterns between independent and dependent variables to make predictions of continuous values.

Algorithm: Linear Regression, Ridge Regression, Lasso Regression, Polynomial Regression

Application: Predicting house prices based on features such as the number of rooms and land area.

Evaluation: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), R-squared (R2).

2. Classification

Definition: Sorting or categorizing data into specific categories or classes based on specific features.

Algorithm: Decision Trees, Random Forest, Support Vector Machines (SVM), K-Nearest Neighbors (KNN)

Application: Identifying gender based on features such as height and weight.

Evaluation: Accuracy, precision, recall, F1-Score, Area Under the ROC Curve (AUC-ROC).

3. Clustering

Definition: Grouping data into clusters or clusters that have internal similarity.

Algorithm: K-Means Clustering, Hierarchical Clustering (Agglomerative), DBSCAN, Gaussian Mixture Model (GMM)

Application: Customer segmentation based on purchasing behavior.

Evaluation: Silhouette Score, Davies-Bouldin Index, Calinski-Harabasz Index, Adjusted Rand Index (ARI), Normalized Mutual Information (NMI).

Machine Learning: Learning Methods

Based on the learning method, machine learning can be categorized into three types: supervised learning, unsupervised learning, and reinforcement learning.

1. Supervised Learning (Labeled Dataset)

A dataset with labels—or often referred to as targets—represents the category of supervised learning.

Land Area (square meters)Number of BedroomsDistance to City Center (km)House Price (Million Rupiah)
15035700
200410900
12023600
2505151200
18038800

Above is an example of a supervised learning dataset. House Price (Million Rupiah) can be considered as the label. Using this dataset, a machine learning model can learn and predict House Price (Million Rupiah) based on Land Area (square meters), Number of Bedrooms, and Distance to City Center (km). The model can learn patterns from input and output data using a specific algorithm.

Application of Supervised Learning can also be observed in: detecting spam emails, face recognition, stock price prediction, image recognition, and in cases of classification or regression datasets.

2. Unsupervised Learning (Unlabeled Dataset)

Machine learning using unlabeled datasets falls under the category of unsupervised learning.

IdProduct A Purchase (Thousand Rupiah)Product B Purchase (Thousand Rupiah)Product C Purchase (Thousand Rupiah)
1503020
2201030
3402515
4302010
5101525

The above is an example of an unsupervised learning dataset, where the dataset includes columns such as Customer Name, Product A Purchase (Thousand Rupiah), Product B Purchase (Thousand Rupiah), and Product C Purchase (Thousand Rupiah). We can see that there is no explicitly defined target for machine learning modeling. Data in this form can be used for customer segmentation (clustering) based on their purchasing patterns. This segmentation is useful for companies to offer their products to customers with different characteristics.

Other applications of unsupervised learning include: product recommendations (based on purchase and browsing history), document clustering based on topics, network security anomaly detection (identifying network behavior), and other datasets that do not have labels.

3. Reinforcement Learning

Reinforcement learning is a machine learning training approach that uses a reward system for desired behavior and punishment for undesired behavior (Hashemi-Pour, 2023). Its working principle is similar to the chess AI bot explained above. The difference is that reinforcement learning can learn through experience or data without explicit programming.

Stages of Machine Learning Creation

The following is a flow illustrating how the process of machine learning creation—generally—can take place. The stages are as follows:

  1. The dataset to be used goes through the preprocessing stage, which involves cleaning, transforming, and organizing raw data. The goal of preprocessing is to improve data quality and make it more suitable for processing by machine learning algorithms or for data analysis.
  2. Next is the Exploratory Data Analysis (EDA), a process of summarizing the main characteristics of the data, usually using visualization methods. The main goal of EDA is to understand the structure and patterns in the data, identify relationships between variables, and gain insights from the dataset. EDA steps include creating graphs, statistical summaries, distribution analysis, and exploring correlations between variables.
  3. After the data is analyzed, it undergoes the feature engineering process. Feature engineering is the process of creating or modifying features in the dataset so that machine learning algorithms can work more effectively. Some feature engineering techniques involve feature selection, handling missing values, scaling, etc.
  4. The dataset that has gone through the preprocessing and feature engineering stages is divided into training and testing datasets. This division is often done with an 80% training dataset and a 20% testing dataset ratio. The training dataset is used for training the machine learning model, while the testing dataset is used to evaluate the model's performance.
  5. The prepared training data then goes through the model training process. Model training is the process in which the machine learning model learns from the dataset to make predictions. The model training process involves using algorithms and adjusting hyperparameters.
  6. If the model is created, we need to test whether the performance of the created model is good or bad. Therefore, in the case of supervised learning, the testing data (which has truth labels) plays a role as input for testing the model. Evaluation is often done using accuracy (classification datasets) or Mean Absolute Error (regression datasets), comparing the true labels in the testing data with the predictions made by the model. In the case of unsupervised learning, evaluation often uses the silhouette score, measuring how similar a value is to its own cluster (cohesion) compared to other clusters (separation).
  7. If the testing results in good performance, the model is ready to be used. To use it, we only need to input the required data, and the model will produce output. However, if testing results in poor performance, we need to evaluate: the model algorithm used, hyperparameters, or even the preprocessing and feature engineering that has been done.

C. Deep Learning

Quoting an article from the Amazon site, deep learning algorithms adopt a neural network structure that mimics the functions of the human brain. The human brain consists of millions of interconnected neurons that function in understanding and processing information. Similarly, deep learning neural networks consist of artificial neurons organized in layers that work together within a computer system. The thing that differentiates deep learning and machine learning is the architecture of their algorithms.

The main components in deep learning network architecture are as follows:

  1. Input Layer: Artificial neural networks consist of nodes that serve as the input for data into the network.

  2. Hidden Layer (relu): The input layer processes and passes data to deeper layers in the neural network. Deep learning networks can have hundreds of hidden layers that function to analyze problems from various perspectives. In the context of classifying flower images, hidden layers can compare various features such as petal shape, color, stem structure, and leaf patterns. Each layer attempts to identify specific patterns, such as whether the flower belongs to a certain group based on certain characteristics, such as unique petals or striking colors.

  3. Output Layer (softmax): The output layer consists of nodes that produce output data. A deep learning model classifying flowers with outputs like "rose," "jasmine," and "orchid" would have three output layers. Meanwhile, a model providing more complex answers would have more nodes to cover a variety of responses.

Below is a visualization of the architecture of a deep learning model classifying image objects into numbers.

Other examples of deep learning applications include: Natural Language Processing (NLP), speech recognition, autonomous cars, stock price prediction, and chatbots.

Isn't it interesting to learn about AI? 😁

Yep, these concepts are somewhat quite challenging to understand. But it's ok, learning takes time!⌛😉

References:

Written by Gaudhiwaa Hendrasto