PLSSVM - Parallel Least Squares Support Vector Machine
2.0.0
A Least Squares Support Vector Machine implementation using different backends.
|
Namespace containing implementation details for the IO related functions. Should not directly be used by users. More...
Classes | |
class | file_reader |
The plssvm::detail::file_reader class is responsible for reading a file and splitting it into its lines. More... | |
Functions | |
template<typename label_type > | |
std::tuple< std::size_t, std::size_t, std::set< label_type >, std::size_t > | parse_arff_header (const std::vector< std::string_view > &lines) |
Parse the ARFF file header, i.e., determine the number of features, the length of the ARRF header, whether the data set is annotated with labels and at which position the label is written in the data set. More... | |
template<typename real_type , typename label_type > | |
std::tuple< std::size_t, std::size_t, std::vector< std::vector< real_type > >, std::vector< label_type > > | parse_arff_data (const file_reader &reader) |
Parse all data points and potential label using the file reader , ignoring all empty lines and lines starting with an % . If no labels are found, returns an empty vector. More... | |
template<typename real_type , typename label_type , bool has_label> | |
void | write_arff_data_impl (const std::string &filename, const std::vector< std::vector< real_type >> &data, const std::vector< label_type > &label) |
Write the provided data and labels to the ARFF file filename . More... | |
template<typename real_type , typename label_type > | |
void | write_arff_data (const std::string &filename, const std::vector< std::vector< real_type >> &data, const std::vector< label_type > &label) |
Write the provided data and labels to the ARFF file filename . More... | |
template<typename real_type > | |
void | write_arff_data (const std::string &filename, const std::vector< std::vector< real_type >> &data) |
Write the provided data to the ARFF file filename . More... | |
void | swap (file_reader &lhs, file_reader &rhs) |
Elementwise swap the contents of lhs and rhs . More... | |
template<typename real_type , typename label_type , typename size_type > | |
std::tuple< plssvm::parameter, real_type, std::vector< label_type >, std::size_t > | parse_libsvm_model_header (const std::vector< std::string_view > &lines) |
Parse the LIBSVM model file header. More... | |
template<typename real_type , typename label_type > | |
std::vector< label_type > | write_libsvm_model_header (fmt::ostream &out, const plssvm::parameter ¶ms, const real_type rho, const data_set< real_type, label_type > &data) |
Write the LIBSVM model file header to out . More... | |
template<typename real_type , typename label_type > | |
void | write_libsvm_model_data (const std::string &filename, const plssvm::parameter ¶ms, const real_type rho, const std::vector< real_type > &alpha, const data_set< real_type, label_type > &data) |
Write the LIBSVM model to the file filename . More... | |
std::size_t | parse_libsvm_num_features (const std::vector< std::string_view > &lines, const std::size_t skipped_lines=0) |
Parse the maximum number of features per data point given in lines , where the first skipped_lines are skipped. More... | |
template<typename real_type , typename label_type > | |
std::tuple< std::size_t, std::size_t, std::vector< std::vector< real_type > >, std::vector< label_type > > | parse_libsvm_data (const file_reader &reader, const std::size_t skipped_lines=0) |
Parse all data points and potential label using the file reader , ignoring all empty lines and lines starting with an # . If no labels are found, returns an empty vector. More... | |
template<typename real_type , typename label_type , bool has_label> | |
void | write_libsvm_data_impl (const std::string &filename, const std::vector< std::vector< real_type >> &data, const std::vector< label_type > &label) |
Write the provided data and labels to the LIBSVM file filename . More... | |
template<typename real_type , typename label_type > | |
void | write_libsvm_data (const std::string &filename, const std::vector< std::vector< real_type >> &data, const std::vector< label_type > &label) |
Write the provided data and labels to the LIBSVM file filename . More... | |
template<typename real_type > | |
void | write_libsvm_data (const std::string &filename, const std::vector< std::vector< real_type >> &data) |
Write the provided data to the LIBSVM file filename . More... | |
template<typename real_type , typename factors_type > | |
std::tuple< std::pair< real_type, real_type >, std::vector< factors_type > > | parse_scaling_factors (const file_reader &reader) |
Read the scaling interval and factors stored using LIBSVM's file format from the file filename . More... | |
template<typename real_type , typename factors_type > | |
void | write_scaling_factors (const std::string &filename, const std::pair< real_type, real_type > &scaling_interval, const std::vector< factors_type > &scaling_factors) |
Write the scaling_interval and scaling_factors to a file for later usage in scaling another data set using LIBSVM's file format. More... | |
Namespace containing implementation details for the IO related functions. Should not directly be used by users.
|
inline |
Parse the ARFF file header, i.e., determine the number of features, the length of the ARRF header, whether the data set is annotated with labels and at which position the label is written in the data set.
label_type | the type of the labels (any arithmetic type or std::string) |
[in] | lines | the ARFF header to parse |
plssvm::invalid_file_format_exception | if the @RELATION field does not come before any other @ATTRIBUTE |
plssvm::invalid_file_format_exception | if the @RELATION field does not have a name |
plssvm::invalid_file_format_exception | if the @RELATION field does have a name with whitespaces but is not quoted |
plssvm::invalid_file_format_exception | if an @ATTRIBUTE field has the type NUMERIC and the name CLASS |
plssvm::invalid_file_format_exception | if an @ATTRIBUTE field does not have a name |
plssvm::invalid_file_format_exception | if an @ATTRIBUTE field does have a name with whitespaces but is not quoted |
plssvm::invalid_file_format_exception | if multiple @ATTRIBUTES with the name CLASS are provided |
plssvm::invalid_file_format_exception | if the class field does not provide any labels |
plssvm::invalid_file_format_exception | if the class field provides labels that are no enclosed in {} (ARFF nominal attributes) |
plssvm::invalid_file_format_exception | if only a single label has been provided |
plssvm::invalid_file_format_exception | if a label has been provided multiple times |
plssvm::invalid_file_format_exception | if a string label contains a whitespace |
plssvm::invalid_file_format_exception | if a header entry starts with an @ but is none of @RELATION, @ATTRIBUTE, or @DATA |
plssvm::invalid_file_format_exception | if no feature attributes are provided |
plssvm::invalid_file_format_exception | if the @DATA attribute is missing |
[[nodiscard]]
)
|
inline |
Parse all data points and potential label using the file reader
, ignoring all empty lines and lines starting with an %
. If no labels are found, returns an empty vector.
An example file can look like
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type or std::string) |
[in] | reader | the file_reader used to read the ARFF data |
plssvm::invalid_file_format_exception | if no features could be found (may indicate an empty file) |
plssvm::invalid_file_format_exception | if a label couldn't be converted to the provided label_type |
plssvm::invalid_file_format_exception | if a feature index couldn't be converted to unsigned long |
plssvm::invalid_file_format_exception | if a feature value couldn't be converted to the provided real_type |
plssvm::invalid_file_format_exception | if an '@' is read inside the @DATA section |
plssvm::invalid_file_format_exception | if a closing curly brace '}' is missing in the sparse data point description |
plssvm::invalid_file_format_exception | if an closing curly brace '{' is missing in the sparse data point description |
plssvm::invalid_file_format_exception | if a index is out-of-bounce with respect to the provided ARFF header information |
plssvm::invalid_file_format_exception | if the ARFF header specifies labels but any data point misses a label |
plssvm::invalid_file_format_exception | if the number of found features and labels mismatches the numbers provided in the ARFF header |
plssvm::invalid_file_format_exception | if a label in the data section has been found, that did not appear in the header |
[[nodiscard]]
)
|
inline |
Write the provided data
and labels
to the ARFF file filename
.
An example file can look like
Note that the output will always be dense, i.e., all features with a value of 0.0
are explicitly written in the resulting file.
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type or std::string) |
has_label | if true the provided labels are also written to the file, if false no labels are outputted |
[in] | filename | the filename to write the data to |
[in] | data | the data points to write to the file |
[in] | label | the labels to write to the file |
|
inline |
Write the provided data
and labels
to the ARFF file filename
.
An example file can look like
Note that the output will always be dense, i.e., all features with a value of 0.0
are explicitly written in the resulting file.
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type or std::string) |
[in] | filename | the filename to write the data to |
[in] | data | the data points to write to the file |
[in] | label | the labels to write to the file |
|
inline |
Write the provided data
to the ARFF file filename
.
An example file can look like
Note that the output will always be dense, i.e., all features with a value of 0.0
are explicitly written in the resulting file.
real_type | the floating point type |
[in] | filename | the filename to write the data to |
[in] | data | the data points to write to the file |
void plssvm::detail::io::swap | ( | file_reader & | lhs, |
file_reader & | rhs | ||
) |
Elementwise swap the contents of lhs
and rhs
.
[in,out] | lhs | the first file_reader |
[in,out] | rhs | the second file_reader |
|
inline |
Parse the LIBSVM model file header.
An example LIBSVM model file header for the linear kernel and two labels could look like
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type, except bool, or std::string) |
size_type | the size type |
[in] | lines | the LIBSVM model file header to parse> |
plssvm::invalid_file_format_exception | if an invalid 'svm_type' has been provided, i.e., 'svm_type' is not 'c_csc' |
plssvm::invalid_file_format_exception | if an invalid 'kernel_type has been provided |
plssvm::invalid_file_format_exception | if the number of support vectors ('total_sv') is zero |
plssvm::invalid_file_format_exception | if less than two labels have been provided |
plssvm::invalid_file_format_exception | if less than two number of support vectors per label have been provided |
plssvm::invalid_file_format_exception | if an invalid header entry has been read |
plssvm::invalid_file_format_exception | if the 'svm_type' is missing |
plssvm::invalid_file_format_exception | if the 'kernel_type' is missing |
plssvm::invalid_file_format_exception | if SVM parameter are explicitly provided that are not used in the give kernel (e.g., 'gamma' is provided for the 'linear' kernel) |
plssvm::invalid_file_format_exception | if the number of classes ('nr_class') is missing |
plssvm::invalid_file_format_exception | if the total number of support vectors ('total_sv') is missing |
plssvm::invalid_file_format_exception | if the value for rho is missing |
plssvm::invalid_file_format_exception | if the labels are missing |
plssvm::invalid_file_format_exception | if the number of provided labels is not the same as the value of 'nr_class' |
plssvm::invalid_file_format_exception | if the number of support vectors per class ('nr_sv') is missing |
plssvm::invalid_file_format_exception | if the number of provided number of support vectors per class is not the same as the value of 'nr_class' |
plssvm::invalid_file_format_exception | if the number of sum of all number of support vectors per class is not the same as the value of 'total_sv' |
plssvm::invalid_file_format_exception | if no support vectors have been provided in the data section |
plssvm::invalid_file_format_exception | if the number of labels is not two |
[[nodiscard]]
)
|
inline |
Write the LIBSVM model file header to out
.
An example LIBSVM model file header for the linear kernel and two labels could look like
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type, except bool, or std::string) |
[in,out] | out | the output-stream to write the header information to |
[in] | params | the SVM parameters |
[in] | rho | the rho value resulting from the hyperplane learning |
[in] | data | the data used to create the model |
[[nodiscard]]
)
|
inline |
Write the LIBSVM model to the file filename
.
An example LIBSVM model file for the linear kernel and two labels could look like
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type, except bool, or std::string) |
[in] | filename | the file to write the LIBSVM model to |
[in] | params | the SVM parameters |
[in] | rho | the rho value resulting from the hyperplane learning |
[in] | alpha | the weights learned by the SVM |
[in] | data | the data used to create the model |
|
inline |
Parse the maximum number of features per data point given in lines
, where the first skipped_lines
are skipped.
The maximum number of features equals the biggest found feature index. Since LIBSVM mandates that the features are ordered strictly increasing, it is sufficient to only look at the last feature index of each data point.
[in] | lines | the LIBSVM data to parse for the number of features |
[in] | skipped_lines | the number of lines that should be skipped at the beginning |
plssvm::invalid_file_format_exception | if a feature index couldn't be converted to unsigned long |
[[nodiscard]]
)
|
inline |
Parse all data points and potential label using the file reader
, ignoring all empty lines and lines starting with an #
. If no labels are found, returns an empty vector.
An example file can look like
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type or std::string) |
[in] | reader | the file_reader used to read the LIBSVM data |
[in] | skipped_lines | the number of lines that should be skipped at the beginning |
plssvm::invalid_file_format_exception | if no features could be found (may indicate an empty file) |
plssvm::invalid_file_format_exception | if a label couldn't be converted to the provided label_type |
plssvm::invalid_file_format_exception | if a feature index couldn't be converted to unsigned long |
plssvm::invalid_file_format_exception | if a feature value couldn't be converted to the provided real_type |
plssvm::invalid_file_format_exception | if the provided LIBSVM file uses zero-based indexing (LIBSVM mandates one-based indices) |
plssvm::invalid_file_format_exception | if the feature (indices) are not given in a strictly increasing order |
plssvm::invalid_file_format_exception | if only some data points are annotated with labels |
[[nodiscard]]
)
|
inline |
Write the provided data
and labels
to the LIBSVM file filename
.
An example file can look like
Note that the output may be sparse, i.e., all features with a value of 0.0
are omitted in the resulting file.
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type or std::string) |
has_label | if true the provided labels are also written to the file, if false no labels are outputted |
[in] | filename | the filename to write the data to |
[in] | data | the data points to write to the file |
[in] | label | the labels to write to the file |
|
inline |
Write the provided data
and labels
to the LIBSVM file filename
.
An example file can look like
Note that the output may be sparse, i.e., all features with a value of 0.0
are omitted in the resulting file.
real_type | the floating point type |
label_type | the type of the labels (any arithmetic type or std::string) |
[in] | filename | the filename to write the data to |
[in] | data | the data points to write to the file |
[in] | label | the labels to write to the file |
|
inline |
Write the provided data
to the LIBSVM file filename
.
An example file can look like
Note that the output may be sparse, i.e., all features with a value of 0.0
are omitted in the resulting file.
real_type | the floating point type |
[in] | filename | the filename to write the data to |
[in] | data | the data points to write to the file |
|
inline |
Read the scaling interval and factors stored using LIBSVM's file format from the file filename
.
An example file can look like
Note that the scaling factors are given using an one-based indexing scheme, but are internally stored using zero-based indexing.
real_type | the used floating point type |
factors_type | plssvm::data_set<real_type>::scaling::factors (cannot be forward declared or included) |
[in] | reader | the file_reader used to read the scaling factors |
plssvm::invalid_file_format_exception | if the header is omitted ('x' and the scaling interval) |
plssvm::invalid_file_format_exception | if the first line doesn't only contain x |
plssvm::invalid_file_format_exception | if the scaling interval is provided with more or less than two values |
plssvm::invalid_file_format_exception | if the scaling factors are provided with more or less than three values |
plssvm::invalid_file_format_exception | if the scaling factors feature index is zero-based instead of one-based |
[[nodiscard]]
)
|
inline |
Write the scaling_interval
and scaling_factors
to a file for later usage in scaling another data set using LIBSVM's file format.
An example file can look like
real_type | the used floating point type |
factors_type | plssvm::data_set<real_type>::scaling::factors (cannot be forward declared or included) |
[in] | filename | the filename to write the data to |
[in] | scaling_interval | the valid scaling interval, i.e., [first, second] |
[in] | scaling_factors | the scaling factor for each feature; given zero based, but written to file one based! |