Exploring AI Fundamentals: A Guide to Essential Terms and Definitions
Generative AI took the world by storm, and a lot of people are talking about how it is going to impact industries across the globe and revolutionize them. In this article, I will talk about a few key terms and definitions that will help you better understand them.
Before delving into generative AI, let us first understand what AI is. In very simple terms, AI (Artificial Intelligence) is the ability of a machine or a computer program to perform tasks, think, and learn like a human. i.e., the goal is to develop agents that can perform the tasks, act, and think like humans.
Below are a few essential terms and their definitions to help you familiarize yourself with AI terminology.
Natural Language Processing (NLP): NLP is a component of artificial intelligence (AI) that bridges human and machine language to enable more natural human-to-machine conversations. Natural Language Understanding (NLU) and Natural Language Generation (NLG) are two subsets of NLP that enable systems to have more human-like conversations. Broadly speaking, NLP looks at how computers can understand and communicate with humans and carry out tasks such as searching, information retrieval, and answering questions.
Natural Language Understanding (NLU): NLU is the ability of a machine to understand and process the meaning of speech or text presented in natural language. NLU includes tasks like extracting meaning from text, recognizing entities in a text, and extracting information regarding those entities.
Natural Language Generation (NLG): NLG constructs the sentences in human-understandable language based on the provided semantics. NLG takes care of converting the structural data into meaningful sentences that humans can understand.
NLG plays a crucial role in the automatic generation of content, and delivers the data in the expected format.
Machine learning is a field of artificial intelligence that focuses on enabling systems or computer applications to learn and improve their experiences without being explicitly programmed.
Machine learning models can be further classified into supervised, unsupervised, and reinforcement learning models.
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Supervised Learning: It is a type of machine learning in which machines are trained using “labeled” datasets. i.e. In Supervised learning, you need to provide the input data and correct output data to the machine learning model, using which the model learns over time. The algorithm measures its accuracy through the loss function, adjusting until the error has been sufficiently minimized.
Supervised learning models are very useful in the use cases like Risk Assessment, Image classification, Fraud Detection, spam filtering, etc.
Unsupervised Learning: It is the complete opposite of supervised learning. In unsupervised learning, you provide the unlabeled dataset, and the machine analyzes the dataset, tries to find patterns, derives meaningful insights, and provides the response.
Unsupervised learning is very useful in the use cases like Recommendation Systems, Customer Segmentation, Anomaly detection, etc.
Reinforcement Learning: It is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback/reward from its own actions and experiences. In this approach, we model an environment after the problem statement. The model interacts with this environment and comes up with solutions all on its own, without human interference. To push it in the right direction, we simply provide it a positive reward if it performs an action that brings it closer to its goal or a negative reward if it goes away from its goal.
Deep Learning: Deep learning is a method in artificial intelligence (AI) that teaches computers to process data in a way inspired by the human brain. Deep learning models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions. You can use deep learning methods to automate tasks that typically require human intelligence, such as describing images or transcribing a sound file into text.
Deep learning is very useful in the use cases like digital assistants, fraud detection, etc.
Deep learning uses artificial neural networks, allowing them to process more complex patterns than traditional machine learning. Artificial neural networks are inspired by the human brain. They are made up of many interconnected nodes, or neurons, that can learn to perform the tasks by processing the data and making predictions. Neural networks can use labeled and unlabelled data.
Deep Learning models are further classified into discriminative and generative types.
Generative/Gen AI: Gen AI is a subset of deep learning that is capable of generating new and original content (text, image, video, and audio) by identifying the patterns and structures within existing data. These systems have been trained on massive amounts of data to generate content based on the queries/prompts.
Generative Pre-trained Transformer (GPT): Generative Pre-trained Transformers, commonly known as GPT, are a family of neural network models that use the transformer architecture and deep learning techniques to generate human-like text and content (image, audio, and video), and answer queries in a conversational manner.
Large Language Models (LLM): Large language models are a subset of Deep Learning. LLM refers to a large, general-purpose language model that can be pre-trained and fine-tuned for specific purposes. LLMs are foundational ML models trained on massive amounts of data to learn patterns and generate new content.
Foundation Models: A foundation model is a large machine learning model/neural network trained on massive amounts of data by leveraging supervised, unsupervised, and semi-supervised learning techniques that can be adapted to a wide range of use cases. These models are trained on publicly available data sources on the internet.
Diffusion Models: Diffusion models are deep generative models that work by adding noise to the available training data (also known as the forward diffusion process) and then reversing the process (known as denoising or the reverse diffusion process) to recover the data. The model gradually learns to remove the noise.
Fine-tuning: fine-tuning refers to bringing your own dataset and retraining the model by tuning every weight in the LLM. This requires hosting your own customized model and a significant amount of training. Fine-tuning is a costly task.
Prompt: Prompts are the queries or inputs that a user or program provides to an AI model to get the desired response from the model. Prompts are what guide the AI model’s output and influence its tone, style, and quality. Prompts can include instructions, questions, or any other type of input, depending on the intended use of the model.
Prompt Design: It involves passing the instructions and context to the model to generate the desired output.
Prompt Tuning: This involves making small adjustments to the original input prompt given to the model during inference time. By changing keywords or phrases within the prompt, you can guide the model toward generating more accurate responses on similar topics.
Prompt Engineering: It is the process of defining, developing, and optimizing the prompts to generate the desired output from the pre-trained model. It modifies not only the prompts but also the contextual environment surrounding them. You might provide additional background knowledge, change the syntax or length of queries, or restructure the entire conversation flow.
Hallucinations: A hallucination is a state that occurs when an AI model generates outputs different from what is expected.
They occur due to various factors such as incorrect assumptions about the problem space, poorly designed algorithms, dataset quality, insufficient guardrails, etc.
Zero-shot learning: Zero-shot learning is a machine learning technique that enables models to recognize and classify objects they have not explicitly trained or encountered before. It does this by leveraging attribute-based information and semantic relationships to bridge the gap between known and unknown classes. This approach makes learning systems more flexible and adaptable.
Ex: A simple model that is trained to distinguish between images of cats and dogs is made to identify images of birds through zero-shot learning.
Few-shot learning: Few-shot learning is a machine learning technique that addresses the challenge of training models with limited labeled data. Few-shot learning focuses on training models that can generalize from a small number of labeled instances. This allows the models to quickly adapt and classify new classes with only a few examples, making them more versatile and efficient in real-world scenarios with limited labeled data availability.
Ex: Let us take bird species identification as an example. By training a model on a small dataset containing five bird species, such as the eagle, sparrow, hummingbird, pigeon, and ostrich, the model learns to recognize key bird features. When presented with a new bird image, the model compares its features to those it learned during training to predict the species. This empowers users to identify various bird species, even with limited examples.
Chain-of-Thought Prompting: Chain-of-thought prompting is used in language generation models to guide the generation process in a coherent and contextually relevant manner. It improves the reasoning ability of LLM by prompting a series of prompts, steps, or instructions that are connected in a logical sequence, similar to a chain of thoughts. This helps the model lead toward the final answer to a multi-step problem.
Example:
Popular Generative AI Tools
- ChatGPT is an advanced chatbot developed by OpenAI. It is based on the GPT-4 language model, and it can be used to have engaging and interactive conversations with people, answer questions, and generate creative text formats.
- Bard is a large language model developed by Google AI. It is similar to GPT-4, and it can be used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. Bard was powered by LaMDA (Language Model for Dialogue Applications)
- PaLM (Pathway Language Model) 2 is Google's latest LLM that is highly capable of advanced reasoning, multilingual, reasoning, classification, and coding capabilities.
- DALL-E 2 is an image-generation tool developed by OpenAI. It can be used to generate realistic images from text descriptions.
- GitHub Copilot is a coding tool developed by GitHub. It is an advanced coding tool that harnesses the power of the OpenAI Codex language model. This remarkable tool is designed to assist developers by generating code, providing code snippets, and offering valuable suggestions to enhance their code.
- AlphaCode is a programming tool developed by DeepMind. It is based on the AlphaFold language model, and it can be used to generate code, solve coding problems, and write different kinds of programming content.
- Synthesia is an innovative video-generation tool that empowers users to create lifelike videos featuring people who can speak, lip-sync, and move with remarkable realism.
Building AI-enabled applications will be the most critical skill in the current era. if you would like to start exploring, I will suggest starting with Python, Open AI APIs, prompt engineering, and Langchain
Thank you for taking the time to read this article. If you enjoyed it and would like to stay updated on various technology topics, please consider subscribing for more insightful content.
References:
https://cloud.google.com/blog/products/ai-machine-learning/generative-ai-for-industries