2026a

# cvpartition

交叉验证的分区数据

函数库: TyMachineLearning

# 语法

kf = cvpartition(n_obsvts::Int, method::String; k::Int=10, p::Float64=0.1)
train_set, test_set = cvpartition(n_obsvts::Int, method::String; k::Int=10, p::Float64=0.1)

# 说明

kf = cvpartition(n_obsvts，method, k, p ) 对数据集进行划分（k-fold）

train_set, test_set= cvpartition(n_obsvts，method, k, p ) 对数据集进行划分（holdout）

# 示例

划分数据集训练决策树模型

加载 iris 数据集。

using TyMachineLearning
using CSV
using DataFrames
iris_X, iris_y = get_irsdata()

创建划分数据对象。

使用决策树分类器。

n_observations = size(iris_y)[1]
kf = cvpartition(n_observations, "kFold")

根据真实标签 y 和预测标签预测 y_pred 查看 Accuracy。

for (train_ids, test_ids) in kf
    train_X = iris_X[train_ids, :]
    train_y = iris_y[train_ids]
    test_X = iris_X[test_ids, :]
    test_y = iris_y[test_ids]
    # do something...
    # Create and train the model
    tree = fitctree(train_X, train_y)

    # Make predictions on the test set
    predictions = TyMachineLearning.predict(tree, test_X)

    # Evaluate the model, e.g., calculate accuracy
    accuracy = sum(predictions .== test_y) / length(test_y)

    # Print the accuracy or perform other operations
    println("Accuracy: $accuracy")
end

Accuracy: 1.0
Accuracy: 0.9333333333333333
Accuracy: 1.0
Accuracy: 1.0
Accuracy: 0.8
Accuracy: 0.8666666666666667
Accuracy: 0.9333333333333333
Accuracy: 0.8666666666666667
Accuracy: 0.9333333333333333
Accuracy: 1.0

该函数是随机划分数据集，所以模型评估是有随机性正常情况。

split = cvpartition(n_observations, "Holdout")
train_ids, test_ids = split
train_X, train_y = iris_X[train_ids, :], iris_y[train_ids]
test_X, test_y = iris_X[test_ids, :], iris_y[test_ids]

([7.7 3.0 6.1 2.3; 4.8 3.0 1.4 0.1; … ; 7.2 3.6 6.1 2.5; 6.2 2.9 4.3 1.3], Int32[2, 0, 1, 2, 2, 1, 1, 0, 2, 0, 0, 1, 1, 2, 1])

该函数是随机划分数据集，所以模型评估是有随机性是正常情况。

# 输入参数

n_obsvts - 数据集中观测值的数量（数据条数)

向量、数组

数据集中观测值的数量（数据条数）

数据类型: Vector | Array

method - 划分方式

指定划分方法

k-fold or holdout 指定为 "kFold", "Holdout"

数据类型: String

折数

如果划分方式为 kFold, k 表示需要划分的折数

数据类型: Int

比例

如果划分方式为 Holdout, p 表示数据划分时测试集所占的比例

数据类型: Float

# 输出参数

kf 对象

一个列表，其中包含 k 折数据划分，每一折中包含了训练集和测试集的索引元组。

train_set

训练集索引

返回训练集索引。

test_set

测试集索引

测试集索引。

# 另请参阅

confusionmat