`ForecasterAutoreg`¶

class

skforecast.ForecasterAutoreg.ForecasterAutoreg.ForecasterAutoreg(regressor, lags)

Bases

skforecast.ForecasterBase.ForecasterBase.ForecasterBase

This class turns any regressor compatible with the scikit-learn API into a recursive autoregressive (multi-step) forecaster.

Parameters

regressor (regressor or pipeline compatible with the scikit-learn API) — An instance of a regressor or pipeline compatible with the scikit-learn API.
lags (int, list, 1d numpy ndarray, range) — Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. int: include lags from 1 to lags (included). list, numpy ndarray or range: include only lags present in lags.

Attributes

X_train_col_names (list) — Names of columns of the matrix created internally for training.
exog_col_names (list) — Names of columns of exog if exog used in training was a pandas DataFrame.
exog_type (type) — Type of exogenous variable/s used in training.
fitted (Bool) — Tag to identify if the regressor has been fitted (trained).
in_sample_residuals (numpy ndarray) — Residuals of the model when predicting training data. Only stored up to 1000 values.
included_exog (bool) — If the forecaster has been trained using exogenous variable/s.
index_freq (str) — Frequency of Index of the input used in training.
index_type (type) — Type of index of the input used in training.
lags (numpy ndarray) — Lags used as predictors.
last_window (pandas Series) — Last window the forecaster has seen during trained. It stores the values needed to predict the next step right after the training data.
max_lag (int) — Maximum value of lag included in lags.
out_sample_residuals (numpy ndarray) — Residuals of the model when predicting non training data. Only stored up to 1000 values.
regressor (regressor or pipeline compatible with the scikit-learn API) — An instance of a regressor or pipeline compatible with the scikit-learn API.
training_range (pandas Index) — First and last values of index of the data used during training.
window_size (int) — Size of the window needed to create the predictors. It is equal to max_lag.

Methods

__repr__() (str) — Information displayed when a ForecasterAutoreg object is printed.
create_train_X_y(y, exog) (X_train : pandas DataFrame, shape (len(y) - self.max_lag, len(self.lags))) — Create training matrices from univariate time series and exogenous variables.
fit(y, exog) (None) — Training Forecaster.
get_coef() (coef : pandas DataFrame) — Return estimated coefficients for the linear regression model stored in the forecaster. Only valid when the forecaster has been trained using as regressor:LinearRegression(),Lasso()orRidge()`.
get_feature_importance() (feature_importance : pandas DataFrame) — Return impurity-based feature importance of the model stored in the forecaster. Only valid when the forecaster has been trained using GradientBoostingRegressor , RandomForestRegressor or HistGradientBoostingRegressor as regressor.
predict(steps, last_window, exog) (predictions : pandas Series) — Predict n steps ahead. It is an recursive process in which, each prediction, is used as a predictor for the next step.
predict_interval(steps, last_window, exog, interval, n_boot, random_state, in_sample_residuals) (predictions : pandas DataFrame) — Iterative process in which, each prediction, is used as a predictor for the next step and bootstrapping is used to estimate prediction intervals. Both, predictions and intervals, are returned.
set_lags(lags) (self) — Set new value to the attribute lags. Attributes max_lag and window_size are also updated.
set_out_sample_residuals(residuals, append) (self) — Set new values to the attribute out_sample_residuals. Out of sample residuals are meant to be calculated using observations that did not participate in the training process.
set_params(**params) (self) — Set new values to the parameters of the scikit learn model stored in the ForecasterAutoreg.

method

__repr__() → str

Information displayed when a ForecasterAutoreg object is printed.

method

create_train_X_y(y, exog=None)

Create training matrices from univariate time series and exogenous variables.

Parameters

y (pandas Series) — Training time series.
exog (pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations as y and their indexes must be aligned.

Returns (X_train : pandas DataFrame, shape (len(y) - self.max_lag, len(self.lags)))

Pandas DataFrame with the training values (predictors).

ain : pandas Series, shape (len(y) - self.max_lag, ) Values (target) of the time series related to each row of X_train.

method

fit(y, exog=None)

Training Forecaster.

Parameters

y (pandas Series) — Training time series.
exog (pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations as y and their indexes must be aligned so that y[i] is regressed on exog[i].

method

predict(steps, last_window=None, exog=None)

Predict n steps ahead. It is an recursive process in which, each prediction, is used as a predictor for the next step.

Parameters

steps (int) — Number of future steps predicted.
last_window (pandas Series, default `None`) — Values of the series used to create the predictors (lags) need in the first iteration of prediction (t + 1).
If last_window = None, the values stored inself.last_window are used to calculate the initial predictors, and the predictions start right after training data.
exog (pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s.

Returns (predictions : pandas Series)

Predicted values.

method

predict_interval(steps, last_window=None, exog=None, interval=[5, 95], n_boot=500, random_state=123, in_sample_residuals=True)

Iterative process in which, each prediction, is used as a predictor for the next step and bootstrapping is used to estimate prediction intervals. Both, predictions and intervals, are returned.

Parameters

steps (int) — Number of future steps predicted.
last_window (pandas Series, default `None`) — Values of the series used to create the predictors (lags) needed in the first iteration of prediction (t + 1).
If last_window = None, the values stored inself.last_window are used to calculate the initial predictors, and the predictions start right after training data.
exog (pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s.
interval (list, default `[5, 95]`) — Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive.
n_boot (int, default `500`) — Number of bootstrapping iterations used to estimate prediction intervals.
random_state (int) — Sets a seed to the random generator, so that boot intervals are always deterministic.
in_sample_residuals (bool, default `True`) — If True, residuals from the training data are used as proxy of prediction error to create prediction intervals. If False, out of sample residuals are used. In the latter case, the user should have calculated and stored the residuals within the forecaster (see set_out_sample_residuals()).

Returns (predictions : pandas DataFrame)

Values predicted by the forecaster and their estimated interval: column pred = predictions. column lower_bound = lower bound of the interval. column upper_bound = upper bound interval of the interval.

Notes

More information about prediction intervals in forecasting: https://otexts.com/fpp2/prediction-intervals.html Forecasting: Principles and Practice (2nd ed) Rob J Hyndman and George Athanasopoulos.

method

set_params(**params)

Set new values to the parameters of the scikit learn model stored in the ForecasterAutoreg.

method

set_lags(lags)

Set new value to the attribute lags. Attributes max_lag and window_size are also updated.

method

set_out_sample_residuals(residuals, append=True)

Set new values to the attribute out_sample_residuals. Out of sample residuals are meant to be calculated using observations that did not participate in the training process.

Parameters

append (bool, default `True`) — If True, new residuals are added to the once already stored in the attribute out_sample_residuals. Once the limit of 1000 values is reached, no more values are appended. If False, out_sample_residuals is overwritten with the new residuals.
params (1D np.ndarray) — Values of residuals. If len(residuals) > 1000, only a random sample of 1000 values are stored.

method

get_coef()

Return estimated coefficients for the linear regression model stored in the forecaster. Only valid when the forecaster has been trained using as regressor:LinearRegression(),Lasso()orRidge()`.

Returns (coef : pandas DataFrame)

Value of the coefficients associated with each predictor.

method

get_feature_importance()

Return impurity-based feature importance of the model stored in the forecaster. Only valid when the forecaster has been trained using GradientBoostingRegressor , RandomForestRegressor or HistGradientBoostingRegressor as regressor.

Returns (feature_importance : pandas DataFrame)

Impurity-based feature importance associated with each predictor.

ForecasterAutoreg¶

`ForecasterAutoreg`¶