Term
Definition
1/48
| Term | Definition |
|---|---|
Term | Definition |
Frequency distribution | A table that groups observations into categories or intervals and records the number of observations in each group. |
Relative frequency | The proportion of observations in each category calculated as frequency divided by total observations. |
Categorical variable | A variable consisting of labels or categories rather than numerical values. |
Numerical variable | A variable representing measurable quantities or counts with meaningful numerical values. |
Bar chart | A graph that uses rectangular bars to represent frequency or relative frequency of categorical data. |
Histogram | A graph of adjacent rectangles showing frequency or relative frequency of numerical data intervals. |
Interval | A range of values used to group numerical data in a frequency distribution. |
Symmetric distribution | A distribution where the left and right sides are mirror images around the center. |
Positively skewed distribution | A distribution with a longer tail extending to the right. |
Negatively skewed distribution | A distribution with a longer tail extending to the left. |
Contingency table | A table summarizing the relationship between two categorical variables using frequencies. |
Stacked column chart | A chart that displays multiple categorical variables by stacking segments within bars. |
Scatterplot | A graph showing the relationship between two numerical variables using plotted points. |
Linear relationship | A relationship between variables that forms a straight-line pattern. |
Nonlinear relationship | A relationship between variables that does not follow a straight line. |
Line chart | A graph connecting data points with lines to show trends over time. |
Data visualization | The graphical or tabular presentation of data to help understand patterns and relationships. |
Regression analysis | A statistical method used to model the relationship between a response variable and predictor variables. |
Response variable | The variable being predicted or explained in a regression model. |
Predictor variable | A variable used to explain or predict the response variable. |
Simple linear regression | A regression model with one predictor variable. |
Multiple linear regression | A regression model with two or more predictor variables. |
Regression model | A mathematical equation describing the relationship between response and predictor variables. |
Intercept | The predicted value of the response variable when all predictors equal zero. |
Slope coefficient | The change in predicted response associated with a one-unit increase in a predictor variable holding others constant. |
Residual | The difference between the observed value and predicted value of the response variable. |
Predicted value (y-hat) | The estimated value of the response variable from the regression equation. |
Ordinary Least Squares (OLS) | A method that estimates regression coefficients by minimizing the sum of squared errors. |
Sum of Squared Errors (SSE) | The sum of squared differences between observed and predicted values. |
Standard error of the estimate | The standard deviation of residuals measuring the typical prediction error. |
Coefficient of determination (R squared) | The proportion of variation in the response variable explained by the regression model. |
Adjusted R squared | A version of R squared that adjusts for the number of predictors and penalizes unnecessary variables. |
Dummy variable | A binary variable coded as 0 or 1 used to represent categorical data in regression. |
Reference category | The omitted category used as the baseline for comparison in dummy variable regression. |
Multicollinearity | A condition where predictor variables are highly linearly related causing unreliable estimates. |
Test of joint significance (F test) | A test used to determine whether predictors jointly influence the response variable. |
Test of individual significance (t test) | A test used to determine whether an individual predictor significantly affects the response variable. |
P value | The probability of observing the sample result assuming the null hypothesis is true. |
Significance level | The threshold probability used to decide whether to reject the null hypothesis. |
Residual plot | A graph used to examine regression assumptions and detect patterns or outliers. |
Outlier | An observation significantly different from the rest of the data. |
Linearity assumption | The assumption that the relationship between predictors and response is linear in parameters. |
Degrees of freedom | The number of observations minus the number of estimated parameters. |
Sample regression equation | The equation using estimated coefficients to predict the response variable. |
Goodness of fit | A measure of how well the regression model explains the observed data. |
Model selection | The process of choosing the best regression model using measures like standard error and adjusted R squared. |
Prediction error | The difference between actual and predicted values. |