Untitled Studyset

Created by Jack Lambert

Term
Definition

1/48

TermDefinition
Term
Definition
Frequency distribution
A table that groups observations into categories or intervals and records the number of observations in each group.
Relative frequency
The proportion of observations in each category calculated as frequency divided by total observations.
Categorical variable
A variable consisting of labels or categories rather than numerical values.
Numerical variable
A variable representing measurable quantities or counts with meaningful numerical values.
Bar chart
A graph that uses rectangular bars to represent frequency or relative frequency of categorical data.
Histogram
A graph of adjacent rectangles showing frequency or relative frequency of numerical data intervals.
Interval
A range of values used to group numerical data in a frequency distribution.
Symmetric distribution
A distribution where the left and right sides are mirror images around the center.
Positively skewed distribution
A distribution with a longer tail extending to the right.
Negatively skewed distribution
A distribution with a longer tail extending to the left.
Contingency table
A table summarizing the relationship between two categorical variables using frequencies.
Stacked column chart
A chart that displays multiple categorical variables by stacking segments within bars.
Scatterplot
A graph showing the relationship between two numerical variables using plotted points.
Linear relationship
A relationship between variables that forms a straight-line pattern.
Nonlinear relationship
A relationship between variables that does not follow a straight line.
Line chart
A graph connecting data points with lines to show trends over time.
Data visualization
The graphical or tabular presentation of data to help understand patterns and relationships.
Regression analysis
A statistical method used to model the relationship between a response variable and predictor variables.
Response variable
The variable being predicted or explained in a regression model.
Predictor variable
A variable used to explain or predict the response variable.
Simple linear regression
A regression model with one predictor variable.
Multiple linear regression
A regression model with two or more predictor variables.
Regression model
A mathematical equation describing the relationship between response and predictor variables.
Intercept
The predicted value of the response variable when all predictors equal zero.
Slope coefficient
The change in predicted response associated with a one-unit increase in a predictor variable holding others constant.
Residual
The difference between the observed value and predicted value of the response variable.
Predicted value (y-hat)
The estimated value of the response variable from the regression equation.
Ordinary Least Squares (OLS)
A method that estimates regression coefficients by minimizing the sum of squared errors.
Sum of Squared Errors (SSE)
The sum of squared differences between observed and predicted values.
Standard error of the estimate
The standard deviation of residuals measuring the typical prediction error.
Coefficient of determination (R squared)
The proportion of variation in the response variable explained by the regression model.
Adjusted R squared
A version of R squared that adjusts for the number of predictors and penalizes unnecessary variables.
Dummy variable
A binary variable coded as 0 or 1 used to represent categorical data in regression.
Reference category
The omitted category used as the baseline for comparison in dummy variable regression.
Multicollinearity
A condition where predictor variables are highly linearly related causing unreliable estimates.
Test of joint significance (F test)
A test used to determine whether predictors jointly influence the response variable.
Test of individual significance (t test)
A test used to determine whether an individual predictor significantly affects the response variable.
P value
The probability of observing the sample result assuming the null hypothesis is true.
Significance level
The threshold probability used to decide whether to reject the null hypothesis.
Residual plot
A graph used to examine regression assumptions and detect patterns or outliers.
Outlier
An observation significantly different from the rest of the data.
Linearity assumption
The assumption that the relationship between predictors and response is linear in parameters.
Degrees of freedom
The number of observations minus the number of estimated parameters.
Sample regression equation
The equation using estimated coefficients to predict the response variable.
Goodness of fit
A measure of how well the regression model explains the observed data.
Model selection
The process of choosing the best regression model using measures like standard error and adjusted R squared.
Prediction error
The difference between actual and predicted values.