530 Exam #2 - Quizfreely

First-Order Logic (FOL) 

 A formal language for representing objects, relations, and functions in the world.

1/311

Practice Test Progress & Stats

Term	Definition
First-Order Logic (FOL)	A formal language for representing objects, relations, and functions in the world.
Constant Symbols	Symbols that refer to specific objects (e.g., Cat, Dog, John).
Predicate Symbols	Symbols representing relations or properties that are true or false (e.g., BiggerThan, BrotherOf).
Function Symbols	Symbols representing functions that map objects to other objects (e.g., LeftLeg(John)).
Logical Variables	Symbols like x or y that can refer to any object in the domain.
Atomic Sentence	A predicate applied to terms, representing a basic true/false fact.
Equality Sentence	A statement asserting two terms refer to the same object.
Complex Sentence	A sentence built using logical connectives or quantifiers.
Universal Quantifier (∀)	States that a property holds for all objects in the domain.
Existential Quantifier (∃)	States that there exists at least one object for which a property holds.
Knowledge Base (KB)	A structured collection of facts and rules represented in FOL.
Ontological Engineering	Designing structured representations of concepts and relationships in a domain.
Upper Ontology	A high-level conceptual framework organizing general categories like objects, events, and processes.
Category (in FOL)	A group of objects sharing properties, representable as predicates or reified objects.
Taxonomic Hierarchy	A structure where categories inherit properties from more general categories.
Inheritance (Categories)	The idea that members of a category inherit properties of that category.
PartOf Relationship	A relation indicating that one object is a component of another.
Measurement Representation	Using FOL to express quantities, units, and conversions.
Fluent	A property whose truth value can change over time (e.g., HaveArrow).
Action	An event that changes the truth value of fluents.
Event Calculus	A formalism for representing events, their effects, and how fluents change over time.
T(f, t1, t2)	A predicate stating fluent f is true between times t1 and t2.
Happens(e, t1, t2)	A predicate stating event e occurs from time t1 to t2.
Initiates(e, f, t)	Event e causes fluent f to become true at time t.
Terminates(e, f, t)	Event e causes fluent f to become false at time t.
Modal Logic	A logic that represents knowledge across different possible worlds.
Referential Transparency	The property that substituting equal terms preserves truth (not always desired in knowledge reasoning).
Semantic Network	A graphical representation of objects, categories, and their relationships.
Default Information	Assumptions treated as true unless contradicted by evidence.
Nonmonotonic Logic	A logic where adding new information can invalidate previous conclusions.
Circumscription	A form of nonmonotonic reasoning that minimizes the extension of certain predicates.
Abnormal Predicate	A predicate used in circumscription to represent exceptions to general rules.
Knowledge Base (KB)	A structured collection of facts and rules represented using First-Order Logic.
FOL for Knowledge Representation	Using First-Order Logic to encode objects, relations, and rules about the world.
Universal Properties in FOL	Statements that apply a property to all objects in a domain.
Existential Properties in FOL	Statements asserting that at least one object has a specific property.
Fluents	Properties that can change over time, representable using event calculus.
Event Calculus	A formalism used to represent how actions change fluents over time.
FOL Complexity	The idea that fully using FOL requires extensive background knowledge not covered in this survey course.
Survey Course Limitation	The course introduces concepts at a high level without covering all formal details.
Uncertainty in AI	The idea that real-world environments involve incomplete, noisy, or unpredictable information.
Partial Observability	When an agent cannot directly observe all relevant aspects of the environment.
Noisy Sensors	Sensors that provide imperfect or unreliable measurements.
World Dynamics Uncertainty	Unknown or unpredictable changes in the environment.
Decision Theory	Framework combining probability and utility to choose optimal actions.
Expected Utility	The value of an action computed as the sum of utilities weighted by probabilities.
Possible Worlds	All distinct ways the world could be, used to define probability models.
Probability Model	A function assigning probabilities to each possible world.
Event	A subset of possible worlds representing an outcome of interest.
Random Variable	A function mapping each possible world to a value of interest.
Range of a Random Variable	The set of all possible values a random variable can take.
Probability Distribution	A function giving the probability of each value of a random variable.
Joint Distribution	A table giving probabilities for all combinations of values of multiple variables.
Marginal Distribution	A distribution obtained by summing out (eliminating) other variables.
Marginalization	The process of summing over variables to compute a marginal distribution.
Conditional Probability	The probability of an event given that another event has occurred.
Normalization	Scaling probabilities so they sum to 1.
Product Rule	P(a, b) = P(a \| b) P(b).
Chain Rule	A method for decomposing joint distributions into conditional probabilities.
Probabilistic Inference	Computing desired probabilities from a probability model.
Inference by Enumeration	A brute-force method that sums over all hidden variables.
Evidence Variables	Observed variables used to condition a probability query.
Query Variable	The variable whose probability distribution we want to compute.
Hidden Variables	Variables not observed but included in the model.
Bayes’ Rule	A formula for reversing conditional probabilities using P(a \| b) = P(b \| a) P(a) / P(b).
Diagnostic Probability	Probability of a cause given an observed effect.
Causal Probability	Probability of an effect given a cause.
Independence	When P(x, y) = P(x) P(y), meaning variables do not influence each other.
Conditional Independence	When two variables are independent given a third variable.
Ghostbusters Domain	A grid-based example illustrating conditional independence and sensor models.
Ghost Location Variable (G)	The random variable representing the ghost’s position.
Sensor Variables (Cxy)	Variables representing the color readings at each grid square.
Sensor Model	The probability distribution P(Cxy \| G) depending on distance to the ghost.
Uniform Prior	A prior distribution where all values are equally likely.
Naïve Bayes Model	A model where all evidence variables are conditionally independent given a single cause variable.
Naïve Bayes Joint Formula	P(Cause, Effects) = P(Cause) Π P(Effect_i \| Cause).
Uncertainty in AI	The idea that real-world environments involve incomplete, noisy, or unpredictable information.
Partial Observability	When an agent cannot directly observe all relevant aspects of the environment.
Noisy Sensors	Sensors that provide imperfect or unreliable measurements.
Decision Theory	Framework combining probability and utility to choose optimal actions.
Expected Utility	The value of an action computed as the sum of utilities weighted by probabilities.
Product Rule	P(a, b) = P(a \| b) P(b), relating joint and conditional probabilities.
Chain Rule	A method for decomposing joint distributions into conditional probabilities.
Inference by Enumeration	A brute-force method for computing probabilities by summing over hidden variables.
Evidence Variables	Observed variables used to condition a probability query.
Query Variable	The variable whose probability distribution we want to compute.
Hidden Variables	Variables not observed but included in the model.
Bayes’ Rule	A formula for reversing conditional probabilities using P(a \| b) = P(b \| a) P(a) / P(b).
Diagnostic Probability	Probability of a cause given an observed effect.
Causal Probability	Probability of an effect given a cause.
Independence	When P(x, y) = P(x) P(y), meaning variables do not influence each other.
Conditional Independence	When two variables are independent given a third variable.
Ghostbusters Domain	A grid-based example illustrating conditional independence and sensor models.
Ghost Location Variable (G)	The random variable representing the ghost’s position.
Sensor Variables (Cxy)	Variables representing the color readings at each grid square.
Sensor Model	The probability distribution P(Cxy \| G) depending on distance to the ghost.
Uniform Prior	A prior distribution where all values are equally likely.
Naïve Bayes Model	A model where all evidence variables are conditionally independent given a single cause variable.
Naïve Bayes Joint Formula	P(Cause, Effects) = P(Cause) Π P(Effect_i \| Cause).
Text Classification (Naïve Bayes)	Using word frequencies and conditional independence to classify documents.
Category Prior	The probability of a class before observing evidence.
Word Likelihood	The probability of a word occurring given a category.
Posterior Probability	Updated probability of a category after observing evidence.
Event	A subset of possible worlds representing an outcome of interest.
Random Variable	A function mapping each possible world to a value of interest.
Distribution	A function giving the probability of each value of a random variable.
Joint Distribution	A table giving probabilities for all combinations of values of multiple variables.
Marginalization	Summing over variables to compute a marginal distribution.
Conditional Probability	The probability of an event given that another event has occurred.
Chain Rule	A decomposition of joint distributions into conditional probabilities.
Bayes’ Rule	A method for reversing conditional probabilities using P(a \| b) = P(b \| a) P(a) / P(b).
Probabilistic Inference	Computing desired probabilities from a probability model.
Inference by Enumeration	A method involving selecting consistent entries, summing out hidden variables, and normalizing.
Independence	When two variables do not influence each other and satisfy P(x, y) = P(x) P(y).
Conditional Independence	When two variables are independent given a third variable.
Bayes Net (Bayesian Network)	A graphical model representing joint distributions using nodes and directed edges.
Graphical Model	A representation of variables and their dependencies using a graph.
Node (in Bayes Net)	A variable in the model.
Arc (in Bayes Net)	A directed edge indicating direct influence between variables.
Directed Acyclic Graph (DAG)	The structure underlying a Bayesian network.
Conditional Probability Table (CPT)	A table specifying probabilities of a variable given its parents.
Local Conditional Distribution	The probability distribution for a node given its parents.
Global Semantics (Bayes Net)	The joint distribution defined as the product of all local conditionals.
Sparse Bayes Net	A Bayes net where each variable has few parents, reducing complexity.
Markov Blanket	The set of a node’s parents, children, and children’s parents; shields the node from the rest of the network.
Conditional Independence Semantics	Every variable is conditionally independent of its non-descendants given its parents.
Topological Order	An ordering of nodes where parents appear before children.
Alarm Network	A classic Bayes net example involving burglary, earthquake, alarm, and neighbor calls.
Free Parameters (CPT)	The number of independent probabilities needed to specify a conditional probability table.
Local Causality	The idea that variables interact only with a small number of others.
Hidden Markov model (HMM)	A model with a Markov chain over hidden states X_t and observations E_t emitted from each state via P(E_t\mid X_t).
Umbrella world state variable	R_t, indicating whether it is raining on day t.
Umbrella world evidence variable	U_t, indicating whether the director carries an umbrella on day t.
Viterbi algorithm	Dynamic programming algorithm that computes, for each state and time, the maximum probability of any path ending there, to find the most likely state sequence.
State trellis	A layered graph whose nodes are states at each time step and whose edges represent transitions with weights from transition and sensor probabilities.
Robot localization problem	Using a transition model over robot positions and a sensor model over wall/no-wall readings to maintain a belief over the robot’s location.
Markov Process	A system where the next state depends only on the current state, not the full history.
First-Order Markov Assumption	The assumption that each state depends only on the immediately previous state.
Stationarity Assumption	The rule that transition probabilities stay the same at every time step.
State Variable	A hidden variable representing the true condition of the system at a given time.
Evidence Variable	An observable variable that provides information about the hidden state.
Transition Model	The model describing how the hidden state evolves from one time step to the next.
Sensor Model	The model describing how observations are generated from the current hidden state.
Discrete Time Model	A model that represents the world as a sequence of time slices.
Umbrella World	A simple example where the hidden state is whether it is raining and the evidence is whether the director carries an umbrella.
Filtering	The task of computing the belief about the current state given all evidence so far.
Prediction	Estimating future states based on the current belief without adding new evidence.
Smoothing	Estimating past states using both earlier and later evidence.
Most Likely Explanation	The single most probable sequence of states that explains the observed evidence.
Forward Algorithm	A recursive method for updating beliefs over time using transition and sensor models.
Viterbi Algorithm	A dynamic programming method for finding the most likely sequence of hidden states.
State Trellis	A layered graph showing all possible states at each time step and transitions between them.
Hidden Markov Model	A model with hidden states that evolve over time and produce observable evidence.
Robot Localization	A problem where a robot uses noisy sensors and movement models to estimate its position.
Belief State	The probability distribution over all possible states at a given time.
Update Step	The filtering step where new evidence is incorporated into the belief state.
Prediction Step	The filtering step where the belief is projected forward using the transition model.
Stationary Distribution	The long-run distribution that a Markov chain converges to after many steps.
Partial Observability	A situation where the agent cannot directly observe the true state of the world.
Dynamic World	A world where the state changes over time while the agent is reasoning.
Atomic Representation	A representation where states are indivisible wholes, used in search algorithms.
Factored Representation	A representation where states are described by variables and their values.
Structured Representation	A representation using objects and relations, as in first-order logic.
Bayesian Network	A factored probabilistic model over a fixed, finite set of random variables.
Probabilistic Programming Language (PPL)	A language that defines probability distributions using programs with random choices.
Generative Program	A program where each random choice defines a random variable in a probability model.
Execution Trace	A sequence of values for all random choices made during one run of a generative program.
Decision-Theoretic Agent	An agent that makes rational decisions using beliefs (probabilities) and utilities.
Utility Function	A function assigning a numerical desirability to each state.
Expected Utility	The average utility of outcomes from a state, weighted by their probabilities.
Maximum Expected Utility (MEU)	The principle that an agent should choose the action with the highest expected utility.
Episodic Decision Problem	A one-shot decision problem where each decision is independent.
Sequential Decision Problem	A decision problem requiring a sequence of actions over time.
Learning	The process of improving performance based on observations.
Machine Learning	When a computer learns models from data to solve problems.
Induction	Reasoning from specific observations to general rules.
Deduction	Reasoning from general rules to specific conclusions.
Classification	A learning task where the output is a discrete label.
Regression	A learning task where the output is a numeric value.
Supervised Learning	Learning from labeled input-output pairs.
Unsupervised Learning	Learning patterns or structure without labeled outputs.
Clustering	Grouping similar examples in unsupervised learning.
Reinforcement Learning	Learning from rewards and punishments to determine good actions.
Training Set	A set of labeled examples used to learn a hypothesis.
Test Set	A set of unseen examples used to evaluate generalization.
Ground Truth	The true output value the model is trying to predict.
Hypothesis	A learned function that approximates the true function.
Hypothesis Space	The set of all candidate hypotheses the learner can choose from.
Consistent Hypothesis	A hypothesis that correctly predicts all training examples.
Generalization	How well a learned hypothesis performs on unseen data.
Bias	Systematic tendency of a hypothesis to deviate from the true function.
Variance	How much a hypothesis changes when trained on different data samples.
Underfitting	When a model is too simple to capture the underlying pattern.
Overfitting	When a model fits the training data too closely and fails to generalize.
Bias-Variance Tradeoff	The balance between simplicity (bias) and flexibility (variance).
Rule of Simplicity	Choose models as simple as possible while still capturing important patterns.
Linear Regression	A method for fitting a straight line or hyperplane to data for prediction.
Prediction Function	The linear model output computed as w0 + w1x.
Prediction Error	The difference between the true value y and the predicted value h(x).
Residual	The error on a single instance, defined as y − h(x).
Least Squares Loss	A loss function that sums the squared prediction errors over all examples.
Gradient Descent	An iterative optimization method that updates weights to reduce loss.
Learning Rate	A small constant controlling how large each gradient descent update is.
Feature Vector	A vector of input features representing an example.
Linear Classifier	A classifier that makes predictions using a weighted sum of features.
Decision Boundary	A line, plane, or surface separating predicted classes.
Threshold Classifier	A classifier that outputs 1 if w·x ≥ 0 and 0 otherwise.
Activation	The weighted sum w·f(x) used to determine a classifier’s output.
Perceptron	A linear classifier that outputs +1 or -1 based on the sign of w·x.
Weight Vector	The set of learned weights applied to input features.
False Negative	When the classifier predicts negative but the true label is positive.
False Positive	When the classifier predicts positive but the true label is negative.
Perceptron Learning Rule	The update rule w ← w + α(y − h(x))x.
Learning Rate (Perceptron)	A constant α controlling how strongly weights are adjusted.
Linearly Separable Data	Data that can be perfectly separated by a hyperplane.
Perceptron Convergence Theorem	States that perceptron learning converges if data is linearly separable.
Non-Separable Data	Data that cannot be perfectly separated by any linear boundary.
Overtraining	When training accuracy rises but test accuracy eventually falls.
Overfitting	When a model fits training data too closely and generalizes poorly.
Averaged Perceptron	A perceptron variant that averages weight vectors to reduce noise.
XOR Problem	A classic example of a function that cannot be learned by a linear classifier.
Sigmoid Function	A smooth function mapping real values to probabilities between 0 and 1.
Logistic Regression	A linear classifier that uses the sigmoid function to output probabilities.
Support Vector Machine (SVM)	A classifier that maximizes the margin between classes.
Ensemble Learning	Using multiple hypotheses together to improve performance.
Bagging	Training multiple models on bootstrap samples and combining their predictions.
Random Forest	An ensemble of decision trees built using random subsets of features.
Boosting	An ensemble method that sequentially trains models to correct previous errors.
Stacking	An ensemble method that combines predictions from multiple models using another learner
Deep Learning	A broad family of machine learning techniques where hypotheses are complex algebraic circuits with tunable weights.
Neural Network	A model composed of interconnected units that compute weighted sums and apply nonlinear activation functions.
Deep Network	A neural network with many layers, giving a long computation path from input to output.
Advantage of Deep Networks	They allow interactions between variables through multiple layers, unlike linear models which combine inputs independently.
Neuron (McCulloch-Pitts Model)	A simple computational unit that sums weighted inputs and applies a nonlinear activation function.
Bias Weight	An additional fixed input with weight that shifts the activation threshold of a neuron.
Feedforward Network	A neural network where connections flow only forward, forming a directed acyclic graph.
Recurrent Network	A neural network with feedback connections, giving it internal state or memory.
Unit (Node)	A component that computes a weighted sum of inputs and applies a nonlinear activation function.
Activation Function	A nonlinear function applied to a unit’s weighted input; enables networks to represent nonlinear functions.
Universal Approximation Theorem	A two‑layer neural network with enough hidden units can approximate any continuous function.
Sigmoid Function	A smooth activation function mapping inputs to values between 0 and 1.
ReLU	Activation function defined as max(0, x), commonly used in modern deep networks.
Softplus	A smooth version of ReLU defined as log(1 + e^x).
Tanh	Activation function mapping inputs to values between -1 and 1.
Building a Network	Combining units into layers so the overall function is a composition of many algebraic operations.
Gradient Descent in Neural Nets	Adjusting weights by computing gradients of loss through all layers.
Backpropagation	The process of passing output error backward through the network to compute gradients for each weight.
Vanishing Gradient Problem	When gradients become extremely small in deep networks, preventing effective learning.
Automatic Differentiation	Software technique (e.g., TensorFlow, PyTorch) that automatically computes derivatives for all weights.
Computation Graph	A dataflow graph where each node represents an elementary computation in the network.
Input Encoding	The process of converting raw data into numeric inputs for a neural network.
One-Hot Encoding	Representing a categorical variable with a vector where exactly one element is 1 and the rest are 0.
Output Encoding	Representing outputs (labels) in a form suitable for loss computation, such as one-hot vectors.
Squared Error Loss	A loss function used for regression that measures squared differences between predicted and true values.
Negative Log Likelihood	A loss function used for classification that encourages high probability for correct labels.
Maximum Likelihood Learning	Choosing weights that maximize the probability of the observed data.
Softmax	An activation function for multiclass classification that outputs a probability distribution over classes.
Hidden Layer	A layer between input and output that transforms representations and may learn meaningful features.
Convolutional Neural Network (CNN)	A neural network with spatially local connections and shared weights (kernels).
Kernel (Filter)	A small set of weights replicated across an image to detect specific features.
Convolution	Operation applying a kernel across an input to produce feature maps.
Stride	The step size of the convolution; larger strides reduce output size.
Receptive Field	The region of input that influences a particular hidden unit.
Pooling Layer	A layer that summarizes adjacent units using a fixed operation such as max or average
Average Pooling	Pooling method that computes the average of inputs in a region.
Max Pooling	Pooling method that outputs the maximum value in a region.
Downsampling	Reducing spatial resolution by pooling or strided convolution.
Fully Connected Layer	A layer where each unit connects to all units in the previous layer, often used near the output.
Residual Network (ResNet)	A deep architecture that adds small perturbations to previous layer outputs to avoid vanishing gradients.
Stochastic Gradient Descent (SGD)	A training method where weights are updated using gradients computed from small minibatches of data.
Minibatch	A small subset of training examples used to compute an approximate gradient during SGD.
Learning Rate Decay	Gradually reducing the learning rate over time to stabilize training.
Momentum	A technique that smooths SGD updates by incorporating past gradients to reduce variance.
Batch Normalization	A method that normalizes activations within a minibatch to stabilize and speed up training.
Generalization	The ability of a model to perform well on unseen data rather than just the training set.
Adversarial Example	An input intentionally perturbed to cause a neural network to make an incorrect prediction.
Weight Decay	A regularization technique that penalizes large weights to reduce overfitting.
Dropout	A regularization method where random units are deactivated during training to prevent co-adaptation.
Recurrent Neural Network (RNN)	A neural network with cycles that allow information to persist across time steps.
Hidden State (RNN)	The internal memory of an RNN that carries information from previous time steps.
Unrolling an RNN	Expanding an RNN across time steps to form a feedforward computation graph for training.
Vanishing Gradient (RNN)	When gradients shrink during backpropagation through time, preventing learning of long-term dependencies.
Exploding Gradient (RNN)	When gradients grow uncontrollably during training, destabilizing learning.
Long Short-Term Memory (LSTM)	A type of RNN with memory cells and gates designed to avoid vanishing and exploding gradients.
Memory Cell (LSTM)	A component that stores information across time steps with minimal modification.
Gating Units (LSTM)	Mechanisms that control information flow into and out of the memory cell.
Unsupervised Learning	Learning patterns from data without labeled outputs.
Clustering	Grouping similar data points based on inherent structure in the data.
K-Means Clustering	A partitional clustering algorithm that assigns points to the nearest centroid and iteratively updates centroids.
Centroid	The center of a cluster, often the mean of its points.
Cluster Shape Problem	The issue that some clustering methods assume round or blob-like clusters, which may not match real data.
Gaussian Mixture Model	A probabilistic clustering model that represents data as a mixture of Gaussian components.
Hierarchical Clustering	A clustering method that builds nested clusters organized as a tree.
Partitional Clustering	A clustering method that divides data into non-overlapping clusters.
Exclusive Clustering	A clustering method where each point belongs to exactly one cluster.
Non-Exclusive Clustering	A clustering method where points may belong to multiple clusters.
Fuzzy Clustering	A clustering method where each point has a degree of membership in each cluster.
Well-Separated Cluster	A cluster where every point is closer to points in its own cluster than to points outside it.
Prototype-Based Cluster	A cluster defined by similarity to a central prototype such as a centroid or medoid.
Contiguity-Based Cluster	A cluster where each point is closer to at least one other point in the same cluster.
Density-Based Cluster	A cluster defined as a dense region separated by low-density areas.
Objective Function (Clustering)	A function that measures clustering quality, such as sum of squared errors.
Initial Centroid Selection	The process of choosing starting centroids for K-means, which strongly affects results.
Autoencoder	A neural network with an encoder and decoder trained to reconstruct its input.
Encoder	The part of an autoencoder that maps input x to a compressed representation z.
Decoder	The part of an autoencoder that reconstructs x from the representation z.
Generative Adversarial Network (GAN)	A system of two networks—a generator and discriminator—trained together to produce realistic data.
Generator (GAN)	A network that produces synthetic data intended to fool the discriminator.
Discriminator (GAN)	A classifier trained to distinguish real data from generated data.
Transfer Learning	Using knowledge from one trained model to improve learning on a related task.
Frozen Layers	Layers whose weights are kept fixed during transfer learning because they capture general features