Model evaluation
Metrics for model evaluation are defined in src/evaluation/
folder.
Metrics
File metrics.jl
defines metrics to compare the quality of the vector of recommended items relative to the ground truth vector (see README):
NeuralCollaborativeFiltering.accuracy
— Functionaccuracy(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: Real
Calculate the accuracy between two vectors. This function computes the proportion of elements that match between the ground truth vector y_vec
and the predicted vector ŷ_vec
.
Arguments
y_vec::Vector{T}
: The ground truth vector (e.g., correct movie IDs).ŷ_vec::Vector{T}
: The predicted vector (e.g., ranked movie IDs).
Returns
Float64
: The accuracy, calculated as the number of matching elements iny_vec
andŷ_vec
, divided by the total number of elements.
Examples
julia> accuracy([1, 2, 3], [3, 2, 4])
0.3333333333333333
NeuralCollaborativeFiltering.average_precision
— Functionaverage_precision(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: Real
Calculate the average precision of a ranked list of items. This function compares the ranked list ŷ_vec
against the ground truth y_vec
to compute the average precision.
Arguments
y_vec::Vector{T}
: A vector of ground truth items (e.g., correct movie IDs).ŷ_vec::Vector{T}
: A vector of predicted items (e.g., ranked movie IDs).
Returns
Float64
: The average precision calculated over the ranked list.
Examples
julia> average_precision([1, 3, 4], [4, 2, 3])
0.5555555555555555
NeuralCollaborativeFiltering.reciprocal_rank
— Functionreciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: Real
Calculate the Reciprocal Rank (RR) for a given pair of vectors, y_vec
and ŷ_vec
. Used in MRR calculation in evaluate_model.jl.
Arguments
y_vec::Vector{T}
: The ground truth vector (e.g., correct movie IDs).ŷ_vec::Vector{T}
: The predicted vector (e.g., ranked movie IDs).
Returns
Float64
: The calculated RR, which is the reciprocal of the rank at which the first relevant item (the first item ofy_vec
) appears inŷ_vec
.
Example
julia> reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.5
NeuralCollaborativeFiltering.extended_reciprocal_rank
— Functionextended_reciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: Real
Calculate the Extended Reciprocal Rank (ExtRR) between two vectors, y_vec
and ŷ_vec
.
Arguments
y_vec::Vector{T}
: The ground truth vector (e.g., correct movie IDs).ŷ_vec::Vector{T}
: The predicted vector (e.g., ranked movie IDs).
Returns
Float64
: The calculated ExtRR.
Example
julia> extended_reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.75
Note
Leaving the function in easy-to-read format, because it is easier to understand.
References
- Source: Towards Data Science
User-wise evaluation
Then, the evaluate_model.jl
utilizes the above metrics to evaluate the model on a specific user or on all the users available in the provided dataset.
NeuralCollaborativeFiltering.evaluate_model_on_1_user
— Functionevaluate_model_on_1_user(m::T, user_id::Int, df_test::DataFrame; top_n_mrr=nothing) where T <: NCFModel
Predicts ranks for movies present in the test set of user with user_id
and calculates 4 different metrics.
Arguments
m<:NCFModel
: The learned model.user_id::Int
: Our user's id.df_test::DataFrame
: The whole test set.top_n_mrr
: Int or nothing. Number of top predictions to be considered.
Returns
NamedTuple
with fields: {ExtRR, RR, AP, ACC}, representing 4 different metrics.
Example
julia> evaluate_model_on_1_user(model, 1, df_test, top_n_mrr=5);
NeuralCollaborativeFiltering.evaluate_model
— Functionevaluatemodel(testdf, m::T; minimalylength=10, topnmap=5) where T <: NCFModel
In contrast to evaluate_model_on_1_user(...)
, calculates metrics on every valid user and averages them by the total number of valid users.
Arguments
m<:NCFModel
: The learned model.test_df
: The whole test set in a DataFrame.minimal_y_length
: Minimum number of test instances for a user to be counted. E.g. if 10, then all users with the number of ranked movies under 10 will be skipped.top_n_mrr
: Int or nothing. Number of top predictions to be considered.
Returns
NamedTuple
with fields: {MeanExtRR, MRR, MAP, MeanACC}, representing 4 different metrics averaged by the total number of valid users.
Example
julia> evaluate_model(df_test, model);