Model evaluation

Metrics for model evaluation are defined in src/evaluation/ folder.

Metrics

File metrics.jl defines metrics to compare the quality of the vector of recommended items relative to the ground truth vector (see README):

NeuralCollaborativeFiltering.accuracyFunction
accuracy(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: Real

Calculate the accuracy between two vectors. This function computes the proportion of elements that match between the ground truth vector y_vec and the predicted vector ŷ_vec.

Arguments

  • y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).
  • ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).

Returns

  • Float64: The accuracy, calculated as the number of matching elements in y_vec and ŷ_vec, divided by the total number of elements.

Examples

julia> accuracy([1, 2, 3], [3, 2, 4])
0.3333333333333333
source
NeuralCollaborativeFiltering.average_precisionFunction
average_precision(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: Real

Calculate the average precision of a ranked list of items. This function compares the ranked list ŷ_vec against the ground truth y_vec to compute the average precision.

Arguments

  • y_vec::Vector{T}: A vector of ground truth items (e.g., correct movie IDs).
  • ŷ_vec::Vector{T}: A vector of predicted items (e.g., ranked movie IDs).

Returns

  • Float64: The average precision calculated over the ranked list.

Examples

julia> average_precision([1, 3, 4], [4, 2, 3])
0.5555555555555555
source
NeuralCollaborativeFiltering.reciprocal_rankFunction
reciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: Real

Calculate the Reciprocal Rank (RR) for a given pair of vectors, y_vec and ŷ_vec. Used in MRR calculation in evaluate_model.jl.

Arguments

  • y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).
  • ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).

Returns

  • Float64: The calculated RR, which is the reciprocal of the rank at which the first relevant item (the first item of y_vec) appears in ŷ_vec.

Example

julia> reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.5
source
NeuralCollaborativeFiltering.extended_reciprocal_rankFunction
extended_reciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: Real

Calculate the Extended Reciprocal Rank (ExtRR) between two vectors, y_vec and ŷ_vec.

Arguments

  • y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).
  • ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).

Returns

  • Float64: The calculated ExtRR.

Example

julia> extended_reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.75

Note

Leaving the function in easy-to-read format, because it is easier to understand.

References

source

User-wise evaluation

Then, the evaluate_model.jl utilizes the above metrics to evaluate the model on a specific user or on all the users available in the provided dataset.

NeuralCollaborativeFiltering.evaluate_model_on_1_userFunction
evaluate_model_on_1_user(m::T, user_id::Int, df_test::DataFrame; top_n_mrr=nothing) where T <: NCFModel

Predicts ranks for movies present in the test set of user with user_id and calculates 4 different metrics.

Arguments

  • m<:NCFModel: The learned model.
  • user_id::Int: Our user's id.
  • df_test::DataFrame: The whole test set.
  • top_n_mrr: Int or nothing. Number of top predictions to be considered.

Returns

  • NamedTuple with fields: {ExtRR, RR, AP, ACC}, representing 4 different metrics.

Example

julia> evaluate_model_on_1_user(model, 1, df_test, top_n_mrr=5);
source
NeuralCollaborativeFiltering.evaluate_modelFunction

evaluatemodel(testdf, m::T; minimalylength=10, topnmap=5) where T <: NCFModel

In contrast to evaluate_model_on_1_user(...), calculates metrics on every valid user and averages them by the total number of valid users.

Arguments

  • m<:NCFModel: The learned model.
  • test_df: The whole test set in a DataFrame.
  • minimal_y_length: Minimum number of test instances for a user to be counted. E.g. if 10, then all users with the number of ranked movies under 10 will be skipped.
  • top_n_mrr: Int or nothing. Number of top predictions to be considered.

Returns

  • NamedTuple with fields: {MeanExtRR, MRR, MAP, MeanACC}, representing 4 different metrics averaged by the total number of valid users.

Example

julia> evaluate_model(df_test, model);
source