Skip to content

ForecasterAutoreg

class
skforecast.ForecasterAutoreg.ForecasterAutoreg.ForecasterAutoreg(regressor, lags)
Bases
skforecast.ForecasterBase.ForecasterBase.ForecasterBase

This class turns any regressor compatible with the scikit-learn API into a recursive autoregressive (multi-step) forecaster.

Parameters
  • regressor (regressor or pipeline compatible with the scikit-learn API) An instance of a regressor or pipeline compatible with the scikit-learn API.
  • lags (int, list, 1d numpy ndarray, range) Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. int: include lags from 1 to lags (included). list, numpy ndarray or range: include only lags present in lags.
Attributes
  • X_train_col_names (list) Names of columns of the matrix created internally for training.
  • exog_col_names (list) Names of columns of exog if exog used in training was a pandas DataFrame.
  • exog_type (type) Type of exogenous variable/s used in training.
  • fitted (Bool) Tag to identify if the regressor has been fitted (trained).
  • in_sample_residuals (numpy ndarray) Residuals of the model when predicting training data. Only stored up to 1000 values.
  • included_exog (bool) If the forecaster has been trained using exogenous variable/s.
  • index_freq (str) Frequency of Index of the input used in training.
  • index_type (type) Type of index of the input used in training.
  • lags (numpy ndarray) Lags used as predictors.
  • last_window (pandas Series) Last window the forecaster has seen during trained. It stores the values needed to predict the next step right after the training data.
  • max_lag (int) Maximum value of lag included in lags.
  • out_sample_residuals (numpy ndarray) Residuals of the model when predicting non training data. Only stored up to 1000 values.
  • regressor (regressor or pipeline compatible with the scikit-learn API) An instance of a regressor or pipeline compatible with the scikit-learn API.
  • training_range (pandas Index) First and last values of index of the data used during training.
  • window_size (int) Size of the window needed to create the predictors. It is equal to max_lag.
Methods
  • __repr__() (str) Information displayed when a ForecasterAutoreg object is printed.
  • create_train_X_y(y, exog) (X_train : pandas DataFrame, shape (len(y) - self.max_lag, len(self.lags))) Create training matrices from univariate time series and exogenous variables.
  • fit(y, exog) (None) Training Forecaster.
  • get_coef() (coef : pandas DataFrame) Return estimated coefficients for the linear regression model stored in the forecaster. Only valid when the forecaster has been trained using as regressor:LinearRegression(),Lasso()orRidge()`.
  • get_feature_importance() (feature_importance : pandas DataFrame) Return impurity-based feature importance of the model stored in the forecaster. Only valid when the forecaster has been trained using GradientBoostingRegressor , RandomForestRegressor or HistGradientBoostingRegressor as regressor.
  • predict(steps, last_window, exog) (predictions : pandas Series) Predict n steps ahead. It is an recursive process in which, each prediction, is used as a predictor for the next step.
  • predict_interval(steps, last_window, exog, interval, n_boot, random_state, in_sample_residuals) (predictions : pandas DataFrame) Iterative process in which, each prediction, is used as a predictor for the next step and bootstrapping is used to estimate prediction intervals. Both, predictions and intervals, are returned.
  • set_lags(lags) (self) Set new value to the attribute lags. Attributes max_lag and window_size are also updated.
  • set_out_sample_residuals(residuals, append) (self) Set new values to the attribute out_sample_residuals. Out of sample residuals are meant to be calculated using observations that did not participate in the training process.
  • set_params(**params) (self) Set new values to the parameters of the scikit learn model stored in the ForecasterAutoreg.
method
__repr__() → str

Information displayed when a ForecasterAutoreg object is printed.

method
create_train_X_y(y, exog=None)

Create training matrices from univariate time series and exogenous variables.

Parameters
  • y (pandas Series) Training time series.
  • exog (pandas Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and their indexes must be aligned.
Returns (X_train : pandas DataFrame, shape (len(y) - self.max_lag, len(self.lags)))

Pandas DataFrame with the training values (predictors).

ain : pandas Series, shape (len(y) - self.max_lag, ) Values (target) of the time series related to each row of X_train.

method
fit(y, exog=None)

Training Forecaster.

Parameters
  • y (pandas Series) Training time series.
  • exog (pandas Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and their indexes must be aligned so that y[i] is regressed on exog[i].
method
predict(steps, last_window=None, exog=None)

Predict n steps ahead. It is an recursive process in which, each prediction, is used as a predictor for the next step.

Parameters
  • steps (int) Number of future steps predicted.
  • last_window (pandas Series, default `None`) Values of the series used to create the predictors (lags) need in the first iteration of prediction (t + 1).
    If last_window = None, the values stored inself.last_window are used to calculate the initial predictors, and the predictions start right after training data.
  • exog (pandas Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s.
Returns (predictions : pandas Series)

Predicted values.

method
predict_interval(steps, last_window=None, exog=None, interval=[5, 95], n_boot=500, random_state=123, in_sample_residuals=True)

Iterative process in which, each prediction, is used as a predictor for the next step and bootstrapping is used to estimate prediction intervals. Both, predictions and intervals, are returned.

Parameters
  • steps (int) Number of future steps predicted.
  • last_window (pandas Series, default `None`) Values of the series used to create the predictors (lags) needed in the first iteration of prediction (t + 1).
    If last_window = None, the values stored inself.last_window are used to calculate the initial predictors, and the predictions start right after training data.
  • exog (pandas Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s.
  • interval (list, default `[5, 95]`) Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive.
  • n_boot (int, default `500`) Number of bootstrapping iterations used to estimate prediction intervals.
  • random_state (int) Sets a seed to the random generator, so that boot intervals are always deterministic.
  • in_sample_residuals (bool, default `True`) If True, residuals from the training data are used as proxy of prediction error to create prediction intervals. If False, out of sample residuals are used. In the latter case, the user should have calculated and stored the residuals within the forecaster (see set_out_sample_residuals()).
Returns (predictions : pandas DataFrame)

Values predicted by the forecaster and their estimated interval: column pred = predictions. column lower_bound = lower bound of the interval. column upper_bound = upper bound interval of the interval.

Notes

More information about prediction intervals in forecasting: https://otexts.com/fpp2/prediction-intervals.html Forecasting: Principles and Practice (2nd ed) Rob J Hyndman and George Athanasopoulos.

method
set_params(**params)

Set new values to the parameters of the scikit learn model stored in the ForecasterAutoreg.

method
set_lags(lags)

Set new value to the attribute lags. Attributes max_lag and window_size are also updated.

method
set_out_sample_residuals(residuals, append=True)

Set new values to the attribute out_sample_residuals. Out of sample residuals are meant to be calculated using observations that did not participate in the training process.

Parameters
  • append (bool, default `True`) If True, new residuals are added to the once already stored in the attribute out_sample_residuals. Once the limit of 1000 values is reached, no more values are appended. If False, out_sample_residuals is overwritten with the new residuals.
  • params (1D np.ndarray) Values of residuals. If len(residuals) > 1000, only a random sample of 1000 values are stored.
method
get_coef()

Return estimated coefficients for the linear regression model stored in the forecaster. Only valid when the forecaster has been trained using as regressor:LinearRegression(),Lasso()orRidge()`.

Returns (coef : pandas DataFrame)

Value of the coefficients associated with each predictor.

method
get_feature_importance()

Return impurity-based feature importance of the model stored in the forecaster. Only valid when the forecaster has been trained using GradientBoostingRegressor , RandomForestRegressor or HistGradientBoostingRegressor as regressor.

Returns (feature_importance : pandas DataFrame)

Impurity-based feature importance associated with each predictor.