Make predictions in 5 minutes

Sklearn, Statsmodels, Tensorflow… Neural nets, regressions. Time series predictions all at once and really simple…

Down there

There are many sophisticated algorithms, libraries and ideas for making predictions, but it takes it’s time to try them all and compare. Therefor we have predictit.

Nowadays one can just find appropriate method, import it and it’s done. But in days of abundance even if we do not have to develop own models, it takes a lot of time to compare all the possible solutions. We have many libraries with machine learning models, many libraries for data analysis and many for data preprocessing. But we need to join all the fragments by ourselves. There is a library/framework for such a tasks. It’s called predictit. It’s open-source with source here and with the documentation.

Compare more than 20 models from Sklearn, Statsmodels, Tensorflow and more just in couple of minutes and also find the optimal input parameters? Yes! How to do that? After

pip install predictit

All you need is

import predictit
predictions = predictit.main.predict()

That’s it. Everything works (on generated test data of course). Just input data that you want to predict and it’s done… You can do config in three ways: Input parameters in predict function or you can use command line arguments or you can edit config.py values. Data sources can be in csv, dataframe, numpy array or SQL. Possible outputs are predictions in numpy array data format or interactive plot. If you’re not using python, use command line arguments like below and run in terminal in folder. Use main.py –help for more parameters info.

python main.py --function predict --data_source 'csv' --csv_path 'test_data/daily-minimum-temperatures.csv' --predicted_column 1

Results can look like this.

After mouse over, you can see exact values, models names and the rank of models by error criterion. It can operate with datetime, data can be resampled and predicted in given frequency.
What models are used? For example…

  • AR (autoregressive model)
  • ARIMA
  • Autoregressive Linear neural unit
  • Conjugate gradient
  • Bayes Ridge Regression
  • Extreme learning machine

Non random Ulam

Software can be used as python library or as standalone framework that you can edit in any fancy way. Check official readme and tests for some use cases. Read all config.py file for what all you can do.
If you want to predict like a pro, you can start here…

import predictit
predictit.config.predicts = 12 # Create 12 predictions
predictit.config.data_source = 'csv' # Define that we load data from CSV
predictit.config.csv_adress = r'E:\VSCODE\Diplomka\test_data\daily-minimum-temperatures.csv' # Load CSV file with data
predictit.config.save_plot_adress = r'C:\Users\TruTonton\Documents\GitHub' # Where to save HTML plot
predictit.config.datalength = 1000 # Consider only last 1000 data points
predictit.config.predicted_columns_names = 'Temp' # Column name that we want to predict
predictit.config.optimizeit = 0 # Find or not best parameters for models
predictit.config.compareit = 6 # Visualize 6 best models
predictit.config.repeatit = 4 # Repeat calculation 4x times on shifted data to reduce chance
predictit.config.other_columns = 0 # Whether use other columns or not
# Chose models that will be computed
used_models = {
"AR (Autoregression)": predictit.models.ar,
"ARIMA (Autoregression integrated moving average)": predictit.models.arima,
"Autoregressive Linear neural unit": predictit.models.autoreg_LNU,
"Conjugate gradient": predictit.models.cg,
"Extreme learning machine": predictit.models.regression,
"Sklearn regression": predictit.models.regression,
}
# Define parameters of models
n_steps_in = 50 # How many lagged values in models
output_shape = 'batch' # Whether batch or one-step models
models_parameters = {
"AR (Autoregression)": {"plot": 0, 'method': 'cmle', 'ic': 'aic', 'trend': 'nc', 'solver': 'lbfgs'},
"ARIMA (Autoregression integrated moving average)": {"p": 12, "d": 0, "q": 1, "plot": 0, 'method': 'css', 'ic': 'aic', 'trend': 'nc', 'solver': 'nm', 'forecast_type': 'out_of_sample'},
"Autoregressive Linear neural unit": {"plot": 0, "lags": n_steps_in, "mi": 1, "minormit": 0, "tlumenimi": 1},
"Conjugate gradient": {"n_steps_in": 30, "epochs": 5, "constant": 1, "other_columns_lenght": None, "constant": None},
"Extreme learning machine": {"n_steps_in": 20, "output_shape": 'one_step', "other_columns_lenght": None, "constant": None, "n_hidden": 20, "alpha": 0.3, "rbf_width": 0, "activation_func": 'selu'},
"Sklearn regression": {"regressor": 'linear', "n_steps_in": n_steps_in, "output_shape": output_shape, "other_columns_lenght": None, "constant": None, "alpha": 0.0001, "n_iter": 100, "epsilon": 1.35, "alphas": [0.1, 0.5, 1], "gcv_mode": 'auto', "solver": 'auto'}
}
predictions = predictit.main.predict()

Except the plot and results, also table of models errors is printed. It can look like this.

Table of results

How the framework works? It’s kind of soft brute force. Result is matrix of predictions that are evaluated with some error criterion. It’s evaluated on more data lengths, and repeated on translated data for accidental success removal. Final n-dimensional matrix is analyzed and the best models are selected with appropriate data lengths and data preprocessing.

You can choose how to standardize data, error criterion or various initial model’s arguments. You can use config.optimize to find the best arguments for given models if you set up arguments limits. It’s based on dividing into intervals, finding best interval and dividing again. It can operate not only with integers and floats, but also with list on strings. It will create all various combinations and find the best one.

If you use for example Sklearn regression model, there is a parameter regressor. There is also function for parsing all the regressors from Sklearn. If you use optimization then, it will find the best suited regression for your data. No need to learn about various algorithms like lasso, or passive aggressive algorithms, just use it.

Stains

If you want to predict more columns, use predict_multiple function. If you need to care about performance, define own test data, run compare_models function, choose only best few models and setup config.lengths=0 and config.repeatit = 1. If you want to see all the results and all the errors, just do config.debug = 1.

If you like it and think it’s useful, just fork it on github. No donations allowed.