16 Self study - Session 2
In this session, we will revisit the chapters Chapter 12, Chapter 13, and Chapter 14.
We have seen that the algorithms for all three tasks more or less can be categorized into the same categories:
- Based on distances or metrics: E.g. trees and forests
- Based on distributions, kernels: E.g. Gaussian Mixture Models, SVMs
- Based on subsequences, shapelets: E.g. bags of words, shapelet-based methods
- Based on deep learning: E.g. recurrent neural networks, or treating the time series as images using CNNs
- Based on random convolutions: E.g. Random Convolutional Kernel Transform (ROCKET)
- Based on ensembles of the above.
We have implemented nearest neighbors (distance-based) for clustering, time series forests (distance-based) for classification, and convolutional neural networks for regression (deep learning-based).
Let us have a look at one additional algorithm for each task.
You can use sktime’s load_UCR_UEA_dataset function to load the data sets mentioned below, e.g.
from sktime.datasets import load_UCR_UEA_dataset
ds_name = "UWaveGestureLibrary"
ret_type = "numpy3d"
X_train, y_train = load_UCR_UEA_dataset(
name=ds_name,
split="train",
return_type=ret_type,
)
X_test, y_test = load_UCR_UEA_dataset(
name=ds_name,
split="test",
return_type=ret_type,
)
# For classification, convert y to categorical
y_train = pd.Series(y_train).astype("category")
y_test = pd.Series(y_test).astype("category")In Chapter 13 and Chapter 14, our input was univariate, meaning each sample consisted of exactly one time series. Now, we will work with multivariate time series, which are more common in real-world scenarios since typically multiple sensors are available.
Note that not all of the algorithms have implementations for multivariate time series, e.g. sktime’s TimeSeriesForestClassifier only works for univariate time series (and raises an exception when used with a multivariate series). Luckily, there are other implementations available, e.g. aeon offers a TimeSeriesForestClassifier that is compatible with multivariate time series.
aeon is quite strict about dependencies. In case you have problems installing it, consider to initialize a new project and first install aeon before adding further packages.
You can use aeon’s load_regression function to load the data set mentioned below, e.g.
from aeon.datasets import load_regression
ds_name = "AppliancesEnergy"
X_train, y_train = load_regression(ds_name, split="train")
X_test, y_test = load_regression(ds_name, split="test")