PLSSVM - Parallel Least Squares Support Vector Machine
2.0.0
A Least Squares Support Vector Machine implementation using different backends.
|
Encapsulate all necessary data that is needed for training or predicting using an SVM. More...
#include <data_set.hpp>
Classes | |
class | label_mapper |
Implements all necessary functionality to map arbitrary labels to labels usable by the C-SVMs. More... | |
class | scaling |
Implements all necessary data and functions needed for scaling a plssvm::data_set to an user-defined range. More... | |
Public Types | |
using | real_type = T |
The type of the data points: either float or double . | |
using | label_type = U |
The type of the labels: any arithmetic type or std::string . | |
using | size_type = std::size_t |
An unsigned integer type. | |
Public Member Functions | |
data_set (const std::string &filename) | |
Read the data points from the file filename . Automatically determines the plssvm::file_format_type based on the file extension. More... | |
data_set (const std::string &filename, file_format_type format) | |
Read the data points from the file filename assuming that the file is given in the plssvm::file_format_type . More... | |
data_set (const std::string &filename, scaling scale_parameter) | |
Read the data points from the file filename and scale it using the provided scale_parameter . Automatically determines the plssvm::file_format_type based on the file extension. More... | |
data_set (const std::string &filename, file_format_type format, scaling scale_parameter) | |
Read the data points from the file filename assuming that the file is given in the plssvm::file_format_type format and scale it using the provided scale_parameter . More... | |
data_set (std::vector< std::vector< real_type >> data_points) | |
Create a new data set using the provided data_points . More... | |
data_set (std::vector< std::vector< real_type >> data_points, std::vector< label_type > labels) | |
Create a new data set using the provided data_points and labels . More... | |
data_set (std::vector< std::vector< real_type >> data_points, scaling scale_parameter) | |
Create a new data set using the the provided data_points and scale them using the provided scale_parameter . More... | |
data_set (std::vector< std::vector< real_type >> data_points, std::vector< label_type > labels, scaling scale_parameter) | |
Create a new data set using the the provided data_points and labels and scale the data_points using the provided scale_parameter . More... | |
void | save (const std::string &filename, file_format_type format) const |
Save the data points and potential labels of this data set to the file filename using the file format type. More... | |
void | save (const std::string &filename) const |
Save the data points and potential labels of this data set to the file filename . Automatically determines the plssvm::file_format_type based on the file extension. More... | |
const std::vector< std::vector< real_type > > & | data () const noexcept |
Return the data points in this data set. More... | |
bool | has_labels () const noexcept |
Returns whether this data set contains labels or not. More... | |
optional_ref< const std::vector< label_type > > | labels () const noexcept |
Returns an optional reference to the labels in this data set. More... | |
std::optional< std::vector< label_type > > | different_labels () const |
Returns an optional to the different labels in this data set. More... | |
size_type | num_data_points () const noexcept |
Returns the number of data points in this data set. More... | |
size_type | num_features () const noexcept |
Returns the number of features in this data set. More... | |
size_type | num_different_labels () const noexcept |
Returns the number of different labels in this data set. More... | |
bool | is_scaled () const noexcept |
Returns whether this data set has been scaled or not. More... | |
optional_ref< const scaling > | scaling_factors () const noexcept |
Returns the scaling factors as an optional reference used to scale the data points in this data set. More... | |
Private Member Functions | |
data_set () | |
Default construct an empty data set. | |
void | create_mapping () |
Create the mapping between the provided labels and the internally used mapped values, i.e., { -1, 1 }. More... | |
void | scale () |
Scale the feature values of the data set to the provided range. More... | |
void | read_file (const std::string &filename, file_format_type format) |
Read the data points and potential labels from the file filename assuming the plssvm::file_format_type format . More... | |
Private Attributes | |
std::shared_ptr< std::vector< std::vector< real_type > > > | X_ptr_ { nullptr } |
A pointer to the two-dimensional data points. | |
std::shared_ptr< std::vector< label_type > > | labels_ptr_ { nullptr } |
A pointer to the original labels of this data set; may be nullptr if no labels have been provided. | |
std::shared_ptr< std::vector< real_type > > | y_ptr_ { nullptr } |
A pointer to the mapped values of the labels of this data set; may be nullptr if no labels have been provided. | |
size_type | num_data_points_ { 0 } |
The number of data points in this data set. | |
size_type | num_features_ { 0 } |
The number of features in this data set. | |
std::shared_ptr< const label_mapper > | mapping_ { nullptr } |
The mapping used to convert the original label to its mapped value and vice versa; may be nullptr if no labels have been provided. | |
std::shared_ptr< scaling > | scale_parameters_ { nullptr } |
The scaling parameters used to scale the data points in this data set; may be nullptr if no data point scaling was requested. | |
Friends | |
template<typename , typename > | |
class | model |
class | csvm |
Encapsulate all necessary data that is needed for training or predicting using an SVM.
May or may not contain labels! Internally, saves all data using std::shared_ptr
to make a plssvm::data_set relatively cheap to copy!
T | the floating point type of the data (must either be float or double ) |
U | the label type of the data (must be an arithmetic type or std::string ; default: int ) |
|
explicit |
Read the data points from the file filename
. Automatically determines the plssvm::file_format_type based on the file extension.
If filename
ends with .arff
it uses the ARFF parser, otherwise the LIBSVM parser is used.
[in] | filename | the file to read the data points from |
plssvm::invalid_file_format_exception | all exceptions thrown by plssvm::data_set::read_file |
plssvm::data_set< T, U >::data_set | ( | const std::string & | filename, |
file_format_type | format | ||
) |
Read the data points from the file filename
assuming that the file is given in the plssvm::file_format_type
.
[in] | filename | the file to read the data points from |
[in] | format | the assumed file format used to parse the data points |
plssvm::invalid_file_format_exception | all exceptions thrown by plssvm::data_set::read_file |
plssvm::data_set< T, U >::data_set | ( | const std::string & | filename, |
scaling | scale_parameter | ||
) |
Read the data points from the file filename
and scale it using the provided scale_parameter
. Automatically determines the plssvm::file_format_type based on the file extension.
If filename
ends with .arff
it uses the ARFF parser, otherwise the LIBSVM parser is used.
[in] | filename | the file to read the data points from |
[in] | scale_parameter | the parameters used to scale the data set feature values to a given range |
plssvm::invalid_file_format_exception | all exceptions thrown by plssvm::data_set::read_file |
plssvm::data_set_exception | all exceptions thrown by plssvm::data_set::scale |
plssvm::data_set< T, U >::data_set | ( | const std::string & | filename, |
file_format_type | format, | ||
scaling | scale_parameter | ||
) |
Read the data points from the file filename
assuming that the file is given in the plssvm::file_format_type format
and scale it using the provided scale_parameter
.
[in] | filename | the file to read the data points from |
[in] | format | the assumed file format used to parse the data points |
[in] | scale_parameter | the parameters used to scale the data set feature values to a given range |
plssvm::invalid_file_format_exception | all exceptions thrown by plssvm::data_set::read_file |
plssvm::data_set_exception | all exceptions thrown by plssvm::data_set::scale |
|
explicit |
Create a new data set using the provided data_points
.
Since no labels are provided, this data set may not be used to a call to plssvm::csvm::fit!
[in] | data_points | the data points used in this data set |
plssvm::data_set_exception | if the data_points vector is empty |
plssvm::data_set_exception | if the data points in data_points have mismatching number of features |
plssvm::data_set_exception | if any data_point has no features |
plssvm::data_set< T, U >::data_set | ( | std::vector< std::vector< real_type >> | data_points, |
std::vector< label_type > | labels | ||
) |
Create a new data set using the provided data_points
and labels
.
[in] | data_points | the data points used in this data set |
[in] | labels | the labels used in this data set |
plssvm::data_set_exception | if the data_points vector is empty |
plssvm::data_set_exception | if the data points in data_points have mismatching number of features |
plssvm::data_set_exception | if any data_point has no features |
plssvm::data_set_exception | if the number of data points in data_points and number of labels mismatch |
plssvm::data_set< T, U >::data_set | ( | std::vector< std::vector< real_type >> | data_points, |
scaling | scale_parameter | ||
) |
Create a new data set using the the provided data_points
and scale them using the provided scale_parameter
.
[in] | data_points | the data points used in this data set |
[in] | scale_parameter | the parameters used to scale the data set feature values to a given range |
plssvm::data_set_exception | if the data_points vector is empty |
plssvm::data_set_exception | if the data points in data_points have mismatching number of features |
plssvm::data_set_exception | if any data_point has no features |
plssvm::data_set_exception | all exceptions thrown by plssvm::data_set::scale |
plssvm::data_set< T, U >::data_set | ( | std::vector< std::vector< real_type >> | data_points, |
std::vector< label_type > | labels, | ||
scaling | scale_parameter | ||
) |
Create a new data set using the the provided data_points
and labels
and scale the data_points
using the provided scale_parameter
.
[in] | data_points | the data points used in this data set |
[in] | labels | the labels used in this data set |
[in] | scale_parameter | the parameters used to scale the data set feature values to a given range |
plssvm::data_set_exception | if the data_points vector is empty |
plssvm::data_set_exception | if the data points in data_points have mismatching number of features |
plssvm::data_set_exception | if any data_point has no features |
plssvm::data_set_exception | if the number of data points in data_points and number of labels mismatch |
plssvm::data_set_exception | all exceptions thrown by plssvm::data_set::scale |
void plssvm::data_set< T, U >::save | ( | const std::string & | filename, |
file_format_type | format | ||
) | const |
Save the data points and potential labels of this data set to the file filename
using the file format
type.
[in] | filename | the file to save the data points and labels to |
[in] | format | the file format |
void plssvm::data_set< T, U >::save | ( | const std::string & | filename | ) | const |
Save the data points and potential labels of this data set to the file filename
. Automatically determines the plssvm::file_format_type based on the file extension.
[in] | filename | the file to save the data points and labels to |
plssvm::data_set_exception | if the file extension isn't one of libsvm or arff |
|
inlinenoexcept |
Return the data points in this data set.
[[nodiscard]]
)
|
inlinenoexcept |
Returns whether this data set contains labels or not.
true
if this data set contains labels, false
otherwise ([[nodiscard]]
)
|
noexcept |
Returns an optional reference to the labels in this data set.
If the labels are present, they can be retrieved as std::vector
using: dataset.labels()->get()
.
std::nullopt
([[nodiscard]]
) auto plssvm::data_set< T, U >::different_labels |
Returns an optional to the different labels in this data set.
If the data set contains the labels std::vector<int>{ -1, 1, 1, -1, -1, 1 }
, this function returns the labels { -1, 1 }
.
std::nullopt
([[nodiscard]]
)
|
inlinenoexcept |
Returns the number of data points in this data set.
[[nodiscard]]
)
|
inlinenoexcept |
Returns the number of features in this data set.
[[nodiscard]]
)
|
inlinenoexcept |
Returns the number of different labels in this data set.
If the data set contains the labels std::vector<int>{ -1, 1, 1, -1, -1, 1 }
, this function returns 2
. It is the same as: dataset.different_labels()->size()
[[nodiscard]]
)
|
inlinenoexcept |
Returns whether this data set has been scaled or not.
The used scaling factors can be retrieved using plssvm::data_set::scaling_factors().
true
if this data set has been scaled, false
otherwise ([[nodiscard]]
)
|
noexcept |
Returns the scaling factors as an optional reference used to scale the data points in this data set.
Can be used to scale another data set in the same way (e.g., a test data set). If the data set has been scaled, the scaling factors can be retrieved as using: dataset.scaling_factors()->get()
.
[[nodiscard]]
)
|
private |
Create the mapping between the provided labels and the internally used mapped values, i.e., { -1, 1 }.
plssvm::data_set_exception | any exception of the plssvm::data_set::label_mapper class |
|
private |
Scale the feature values of the data set to the provided range.
Scales all data points feature wise, i.e., one scaling factor is responsible, e.g., for the first feature of all data points.
Scaling a data value \(x\) to the range \([a, b]\) is done with the formular: \(x_{scaled} = a + (b - a) \cdot \frac{x - min(x)}{max(x) - min(x)}\)
plssvm::data_set_exception | if more scaling factors than features are present |
plssvm::data_set_exception | if the largest scaling factor index is larger than the number of features |
plssvm::data_set_exception | if for any feature more than one scaling factor is present |
|
private |
Read the data points and potential labels from the file filename
assuming the plssvm::file_format_type format
.
[in] | filename | the filename to read the data from |
[in] | format | the assumed file format type |
plssvm::invalid_file_format_exception | all exceptions thrown by the respective functions in the plssvm::detail::io namespace |
plssvm::data_set_exception | if labels are present in filename , all exceptions thrown by plssvm::data_set::create_mapping |