Model evaluation
Metrics for model evaluation are defined in src/evaluation/ folder.
Metrics
File metrics.jl defines metrics to compare the quality of the vector of recommended items relative to the ground truth vector (see README):
NeuralCollaborativeFiltering.accuracy — Functionaccuracy(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: RealCalculate the accuracy between two vectors. This function computes the proportion of elements that match between the ground truth vector y_vec and the predicted vector ŷ_vec.
Arguments
y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).
Returns
Float64: The accuracy, calculated as the number of matching elements iny_vecandŷ_vec, divided by the total number of elements.
Examples
julia> accuracy([1, 2, 3], [3, 2, 4])
0.3333333333333333NeuralCollaborativeFiltering.average_precision — Functionaverage_precision(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: RealCalculate the average precision of a ranked list of items. This function compares the ranked list ŷ_vec against the ground truth y_vec to compute the average precision.
Arguments
y_vec::Vector{T}: A vector of ground truth items (e.g., correct movie IDs).ŷ_vec::Vector{T}: A vector of predicted items (e.g., ranked movie IDs).
Returns
Float64: The average precision calculated over the ranked list.
Examples
julia> average_precision([1, 3, 4], [4, 2, 3])
0.5555555555555555NeuralCollaborativeFiltering.reciprocal_rank — Functionreciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: RealCalculate the Reciprocal Rank (RR) for a given pair of vectors, y_vec and ŷ_vec. Used in MRR calculation in evaluate_model.jl.
Arguments
y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).
Returns
Float64: The calculated RR, which is the reciprocal of the rank at which the first relevant item (the first item ofy_vec) appears inŷ_vec.
Example
julia> reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.5NeuralCollaborativeFiltering.extended_reciprocal_rank — Functionextended_reciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: RealCalculate the Extended Reciprocal Rank (ExtRR) between two vectors, y_vec and ŷ_vec.
Arguments
y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).
Returns
Float64: The calculated ExtRR.
Example
julia> extended_reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.75Note
Leaving the function in easy-to-read format, because it is easier to understand.
References
- Source: Towards Data Science
User-wise evaluation
Then, the evaluate_model.jl utilizes the above metrics to evaluate the model on a specific user or on all the users available in the provided dataset.
NeuralCollaborativeFiltering.evaluate_model_on_1_user — Functionevaluate_model_on_1_user(m::T, user_id::Int, df_test::DataFrame; top_n_mrr=nothing) where T <: NCFModelPredicts ranks for movies present in the test set of user with user_id and calculates 4 different metrics.
Arguments
m<:NCFModel: The learned model.user_id::Int: Our user's id.df_test::DataFrame: The whole test set.top_n_mrr: Int or nothing. Number of top predictions to be considered.
Returns
NamedTuplewith fields: {ExtRR, RR, AP, ACC}, representing 4 different metrics.
Example
julia> evaluate_model_on_1_user(model, 1, df_test, top_n_mrr=5);NeuralCollaborativeFiltering.evaluate_model — Functionevaluatemodel(testdf, m::T; minimalylength=10, topnmap=5) where T <: NCFModel
In contrast to evaluate_model_on_1_user(...), calculates metrics on every valid user and averages them by the total number of valid users.
Arguments
m<:NCFModel: The learned model.test_df: The whole test set in a DataFrame.minimal_y_length: Minimum number of test instances for a user to be counted. E.g. if 10, then all users with the number of ranked movies under 10 will be skipped.top_n_mrr: Int or nothing. Number of top predictions to be considered.
Returns
NamedTuplewith fields: {MeanExtRR, MRR, MAP, MeanACC}, representing 4 different metrics averaged by the total number of valid users.
Example
julia> evaluate_model(df_test, model);