PLSSVM - Parallel Least Squares Support Vector Machine  2.0.0
A Least Squares Support Vector Machine implementation using different backends.
Classes | Public Types | Public Member Functions | Private Member Functions | Private Attributes | Friends | List of all members
plssvm::data_set< T, U > Class Template Reference

Encapsulate all necessary data that is needed for training or predicting using an SVM. More...

#include <data_set.hpp>

Classes

class  label_mapper
 Implements all necessary functionality to map arbitrary labels to labels usable by the C-SVMs. More...
 
class  scaling
 Implements all necessary data and functions needed for scaling a plssvm::data_set to an user-defined range. More...
 

Public Types

using real_type = T
 The type of the data points: either float or double.
 
using label_type = U
 The type of the labels: any arithmetic type or std::string.
 
using size_type = std::size_t
 An unsigned integer type.
 

Public Member Functions

 data_set (const std::string &filename)
 Read the data points from the file filename. Automatically determines the plssvm::file_format_type based on the file extension. More...
 
 data_set (const std::string &filename, file_format_type format)
 Read the data points from the file filename assuming that the file is given in the plssvm::file_format_type. More...
 
 data_set (const std::string &filename, scaling scale_parameter)
 Read the data points from the file filename and scale it using the provided scale_parameter. Automatically determines the plssvm::file_format_type based on the file extension. More...
 
 data_set (const std::string &filename, file_format_type format, scaling scale_parameter)
 Read the data points from the file filename assuming that the file is given in the plssvm::file_format_type format and scale it using the provided scale_parameter. More...
 
 data_set (std::vector< std::vector< real_type >> data_points)
 Create a new data set using the provided data_points. More...
 
 data_set (std::vector< std::vector< real_type >> data_points, std::vector< label_type > labels)
 Create a new data set using the provided data_points and labels. More...
 
 data_set (std::vector< std::vector< real_type >> data_points, scaling scale_parameter)
 Create a new data set using the the provided data_points and scale them using the provided scale_parameter. More...
 
 data_set (std::vector< std::vector< real_type >> data_points, std::vector< label_type > labels, scaling scale_parameter)
 Create a new data set using the the provided data_points and labels and scale the data_points using the provided scale_parameter. More...
 
void save (const std::string &filename, file_format_type format) const
 Save the data points and potential labels of this data set to the file filename using the file format type. More...
 
void save (const std::string &filename) const
 Save the data points and potential labels of this data set to the file filename. Automatically determines the plssvm::file_format_type based on the file extension. More...
 
const std::vector< std::vector< real_type > > & data () const noexcept
 Return the data points in this data set. More...
 
bool has_labels () const noexcept
 Returns whether this data set contains labels or not. More...
 
optional_ref< const std::vector< label_type > > labels () const noexcept
 Returns an optional reference to the labels in this data set. More...
 
std::optional< std::vector< label_type > > different_labels () const
 Returns an optional to the different labels in this data set. More...
 
size_type num_data_points () const noexcept
 Returns the number of data points in this data set. More...
 
size_type num_features () const noexcept
 Returns the number of features in this data set. More...
 
size_type num_different_labels () const noexcept
 Returns the number of different labels in this data set. More...
 
bool is_scaled () const noexcept
 Returns whether this data set has been scaled or not. More...
 
optional_ref< const scalingscaling_factors () const noexcept
 Returns the scaling factors as an optional reference used to scale the data points in this data set. More...
 

Private Member Functions

 data_set ()
 Default construct an empty data set.
 
void create_mapping ()
 Create the mapping between the provided labels and the internally used mapped values, i.e., { -1, 1 }. More...
 
void scale ()
 Scale the feature values of the data set to the provided range. More...
 
void read_file (const std::string &filename, file_format_type format)
 Read the data points and potential labels from the file filename assuming the plssvm::file_format_type format. More...
 

Private Attributes

std::shared_ptr< std::vector< std::vector< real_type > > > X_ptr_ { nullptr }
 A pointer to the two-dimensional data points.
 
std::shared_ptr< std::vector< label_type > > labels_ptr_ { nullptr }
 A pointer to the original labels of this data set; may be nullptr if no labels have been provided.
 
std::shared_ptr< std::vector< real_type > > y_ptr_ { nullptr }
 A pointer to the mapped values of the labels of this data set; may be nullptr if no labels have been provided.
 
size_type num_data_points_ { 0 }
 The number of data points in this data set.
 
size_type num_features_ { 0 }
 The number of features in this data set.
 
std::shared_ptr< const label_mappermapping_ { nullptr }
 The mapping used to convert the original label to its mapped value and vice versa; may be nullptr if no labels have been provided.
 
std::shared_ptr< scalingscale_parameters_ { nullptr }
 The scaling parameters used to scale the data points in this data set; may be nullptr if no data point scaling was requested.
 

Friends

template<typename , typename >
class model
 
class csvm
 

Detailed Description

template<typename T, typename U = int>
class plssvm::data_set< T, U >

Encapsulate all necessary data that is needed for training or predicting using an SVM.

May or may not contain labels! Internally, saves all data using std::shared_ptr to make a plssvm::data_set relatively cheap to copy!

Template Parameters
Tthe floating point type of the data (must either be float or double)
Uthe label type of the data (must be an arithmetic type or std::string; default: int)
Examples
csvm_examples.cpp, data_set_examples.cpp, and model_examples.cpp.

Constructor & Destructor Documentation

◆ data_set() [1/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( const std::string &  filename)
explicit

Read the data points from the file filename. Automatically determines the plssvm::file_format_type based on the file extension.

If filename ends with .arff it uses the ARFF parser, otherwise the LIBSVM parser is used.

Parameters
[in]filenamethe file to read the data points from
Exceptions
plssvm::invalid_file_format_exceptionall exceptions thrown by plssvm::data_set::read_file

◆ data_set() [2/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( const std::string &  filename,
file_format_type  format 
)

Read the data points from the file filename assuming that the file is given in the plssvm::file_format_type.

Parameters
[in]filenamethe file to read the data points from
[in]formatthe assumed file format used to parse the data points
Exceptions
plssvm::invalid_file_format_exceptionall exceptions thrown by plssvm::data_set::read_file

◆ data_set() [3/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( const std::string &  filename,
scaling  scale_parameter 
)

Read the data points from the file filename and scale it using the provided scale_parameter. Automatically determines the plssvm::file_format_type based on the file extension.

If filename ends with .arff it uses the ARFF parser, otherwise the LIBSVM parser is used.

Parameters
[in]filenamethe file to read the data points from
[in]scale_parameterthe parameters used to scale the data set feature values to a given range
Exceptions
plssvm::invalid_file_format_exceptionall exceptions thrown by plssvm::data_set::read_file
plssvm::data_set_exceptionall exceptions thrown by plssvm::data_set::scale

◆ data_set() [4/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( const std::string &  filename,
file_format_type  format,
scaling  scale_parameter 
)

Read the data points from the file filename assuming that the file is given in the plssvm::file_format_type format and scale it using the provided scale_parameter.

Parameters
[in]filenamethe file to read the data points from
[in]formatthe assumed file format used to parse the data points
[in]scale_parameterthe parameters used to scale the data set feature values to a given range
Exceptions
plssvm::invalid_file_format_exceptionall exceptions thrown by plssvm::data_set::read_file
plssvm::data_set_exceptionall exceptions thrown by plssvm::data_set::scale

◆ data_set() [5/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( std::vector< std::vector< real_type >>  data_points)
explicit

Create a new data set using the provided data_points.

Since no labels are provided, this data set may not be used to a call to plssvm::csvm::fit!

Parameters
[in]data_pointsthe data points used in this data set
Exceptions
plssvm::data_set_exceptionif the data_points vector is empty
plssvm::data_set_exceptionif the data points in data_points have mismatching number of features
plssvm::data_set_exceptionif any data_point has no features

◆ data_set() [6/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( std::vector< std::vector< real_type >>  data_points,
std::vector< label_type labels 
)

Create a new data set using the provided data_points and labels.

Parameters
[in]data_pointsthe data points used in this data set
[in]labelsthe labels used in this data set
Exceptions
plssvm::data_set_exceptionif the data_points vector is empty
plssvm::data_set_exceptionif the data points in data_points have mismatching number of features
plssvm::data_set_exceptionif any data_point has no features
plssvm::data_set_exceptionif the number of data points in data_points and number of labels mismatch

◆ data_set() [7/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( std::vector< std::vector< real_type >>  data_points,
scaling  scale_parameter 
)

Create a new data set using the the provided data_points and scale them using the provided scale_parameter.

Parameters
[in]data_pointsthe data points used in this data set
[in]scale_parameterthe parameters used to scale the data set feature values to a given range
Exceptions
plssvm::data_set_exceptionif the data_points vector is empty
plssvm::data_set_exceptionif the data points in data_points have mismatching number of features
plssvm::data_set_exceptionif any data_point has no features
plssvm::data_set_exceptionall exceptions thrown by plssvm::data_set::scale

◆ data_set() [8/8]

template<typename T , typename U >
plssvm::data_set< T, U >::data_set ( std::vector< std::vector< real_type >>  data_points,
std::vector< label_type labels,
scaling  scale_parameter 
)

Create a new data set using the the provided data_points and labels and scale the data_points using the provided scale_parameter.

Parameters
[in]data_pointsthe data points used in this data set
[in]labelsthe labels used in this data set
[in]scale_parameterthe parameters used to scale the data set feature values to a given range
Exceptions
plssvm::data_set_exceptionif the data_points vector is empty
plssvm::data_set_exceptionif the data points in data_points have mismatching number of features
plssvm::data_set_exceptionif any data_point has no features
plssvm::data_set_exceptionif the number of data points in data_points and number of labels mismatch
plssvm::data_set_exceptionall exceptions thrown by plssvm::data_set::scale

Member Function Documentation

◆ save() [1/2]

template<typename T , typename U >
void plssvm::data_set< T, U >::save ( const std::string &  filename,
file_format_type  format 
) const

Save the data points and potential labels of this data set to the file filename using the file format type.

Parameters
[in]filenamethe file to save the data points and labels to
[in]formatthe file format

◆ save() [2/2]

template<typename T , typename U >
void plssvm::data_set< T, U >::save ( const std::string &  filename) const

Save the data points and potential labels of this data set to the file filename. Automatically determines the plssvm::file_format_type based on the file extension.

Parameters
[in]filenamethe file to save the data points and labels to
Exceptions
plssvm::data_set_exceptionif the file extension isn't one of libsvm or arff

◆ data()

template<typename T , typename U = int>
const std::vector<std::vector<real_type> >& plssvm::data_set< T, U >::data ( ) const
inlinenoexcept

Return the data points in this data set.

Returns
the data points ([[nodiscard]])

◆ has_labels()

template<typename T , typename U = int>
bool plssvm::data_set< T, U >::has_labels ( ) const
inlinenoexcept

Returns whether this data set contains labels or not.

Returns
true if this data set contains labels, false otherwise ([[nodiscard]])
Examples
data_set_examples.cpp.

◆ labels()

template<typename T , typename U >
auto plssvm::data_set< T, U >::labels
noexcept

Returns an optional reference to the labels in this data set.

If the labels are present, they can be retrieved as std::vector using: dataset.labels()->get().

Returns
if this data set contains labels, returns a reference to them, otherwise returns a std::nullopt ([[nodiscard]])

◆ different_labels()

template<typename T , typename U >
auto plssvm::data_set< T, U >::different_labels

Returns an optional to the different labels in this data set.

If the data set contains the labels std::vector<int>{ -1, 1, 1, -1, -1, 1 }, this function returns the labels { -1, 1 }.

Note
Must not return a optional reference, since it would bind to a temporary!
Returns
if this data set contains labels, returns a reference to all different labels, otherwise returns a std::nullopt ([[nodiscard]])

◆ num_data_points()

template<typename T , typename U = int>
size_type plssvm::data_set< T, U >::num_data_points ( ) const
inlinenoexcept

Returns the number of data points in this data set.

Returns
the number of data points ([[nodiscard]])

◆ num_features()

template<typename T , typename U = int>
size_type plssvm::data_set< T, U >::num_features ( ) const
inlinenoexcept

Returns the number of features in this data set.

Returns
the number of features ([[nodiscard]])

◆ num_different_labels()

template<typename T , typename U = int>
size_type plssvm::data_set< T, U >::num_different_labels ( ) const
inlinenoexcept

Returns the number of different labels in this data set.

If the data set contains the labels std::vector<int>{ -1, 1, 1, -1, -1, 1 }, this function returns 2. It is the same as: dataset.different_labels()->size()

Returns
the number of different labels ([[nodiscard]])

◆ is_scaled()

template<typename T , typename U = int>
bool plssvm::data_set< T, U >::is_scaled ( ) const
inlinenoexcept

Returns whether this data set has been scaled or not.

The used scaling factors can be retrieved using plssvm::data_set::scaling_factors().

Returns
true if this data set has been scaled, false otherwise ([[nodiscard]])

◆ scaling_factors()

template<typename T , typename U >
auto plssvm::data_set< T, U >::scaling_factors
noexcept

Returns the scaling factors as an optional reference used to scale the data points in this data set.

Can be used to scale another data set in the same way (e.g., a test data set). If the data set has been scaled, the scaling factors can be retrieved as using: dataset.scaling_factors()->get().

Returns
the scaling factors ([[nodiscard]])
Examples
data_set_examples.cpp.

◆ create_mapping()

template<typename T , typename U >
void plssvm::data_set< T, U >::create_mapping
private

Create the mapping between the provided labels and the internally used mapped values, i.e., { -1, 1 }.

Exceptions
plssvm::data_set_exceptionany exception of the plssvm::data_set::label_mapper class

◆ scale()

template<typename T , typename U >
void plssvm::data_set< T, U >::scale
private

Scale the feature values of the data set to the provided range.

Scales all data points feature wise, i.e., one scaling factor is responsible, e.g., for the first feature of all data points.
Scaling a data value \(x\) to the range \([a, b]\) is done with the formular: \(x_{scaled} = a + (b - a) \cdot \frac{x - min(x)}{max(x) - min(x)}\)

Exceptions
plssvm::data_set_exceptionif more scaling factors than features are present
plssvm::data_set_exceptionif the largest scaling factor index is larger than the number of features
plssvm::data_set_exceptionif for any feature more than one scaling factor is present

◆ read_file()

template<typename T , typename U >
void plssvm::data_set< T, U >::read_file ( const std::string &  filename,
file_format_type  format 
)
private

Read the data points and potential labels from the file filename assuming the plssvm::file_format_type format.

Parameters
[in]filenamethe filename to read the data from
[in]formatthe assumed file format type
Exceptions
plssvm::invalid_file_format_exceptionall exceptions thrown by the respective functions in the plssvm::detail::io namespace
plssvm::data_set_exceptionif labels are present in filename, all exceptions thrown by plssvm::data_set::create_mapping

The documentation for this class was generated from the following file: