Model evaluation

Metrics for model evaluation are defined in src/evaluation/ folder.

Metrics

File metrics.jl defines metrics to compare the quality of the vector of recommended items relative to the ground truth vector (see README):

NeuralCollaborativeFiltering.accuracy — Function

accuracy(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: Real

Calculate the accuracy between two vectors. This function computes the proportion of elements that match between the ground truth vector y_vec and the predicted vector ŷ_vec.

Arguments

y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).
ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).

Returns

Float64: The accuracy, calculated as the number of matching elements in y_vec and ŷ_vec, divided by the total number of elements.

Examples

julia> accuracy([1, 2, 3], [3, 2, 4])
0.3333333333333333

source

NeuralCollaborativeFiltering.average_precision — Function

average_precision(y_vec::Vector{T}, ŷ_vec::Vector{T}) -> Float64 where T <: Real

Calculate the average precision of a ranked list of items. This function compares the ranked list ŷ_vec against the ground truth y_vec to compute the average precision.

Arguments

y_vec::Vector{T}: A vector of ground truth items (e.g., correct movie IDs).
ŷ_vec::Vector{T}: A vector of predicted items (e.g., ranked movie IDs).

Returns

Float64: The average precision calculated over the ranked list.

Examples

julia> average_precision([1, 3, 4], [4, 2, 3])
0.5555555555555555

source

NeuralCollaborativeFiltering.reciprocal_rank — Function

reciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: Real

Calculate the Reciprocal Rank (RR) for a given pair of vectors, y_vec and ŷ_vec. Used in MRR calculation in evaluate_model.jl.

Arguments

y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).
ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).

Returns

Float64: The calculated RR, which is the reciprocal of the rank at which the first relevant item (the first item of y_vec) appears in ŷ_vec.

Example

julia> reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.5

source

NeuralCollaborativeFiltering.extended_reciprocal_rank — Function

extended_reciprocal_rank(y_vec::Vector{T}, ŷ_vec::Vector{T}) where T <: Real

Calculate the Extended Reciprocal Rank (ExtRR) between two vectors, y_vec and ŷ_vec.

Arguments

y_vec::Vector{T}: The ground truth vector (e.g., correct movie IDs).
ŷ_vec::Vector{T}: The predicted vector (e.g., ranked movie IDs).

Returns

Float64: The calculated ExtRR.

Example

julia> extended_reciprocal_rank([3, 1, 4, 2], [1, 3, 2, 4])
0.75

Note

Leaving the function in easy-to-read format, because it is easier to understand.

References

Source: Towards Data Science

source

User-wise evaluation

Then, the evaluate_model.jl utilizes the above metrics to evaluate the model on a specific user or on all the users available in the provided dataset.

NeuralCollaborativeFiltering.evaluate_model_on_1_user — Function

evaluate_model_on_1_user(m::T, user_id::Int, df_test::DataFrame; top_n_mrr=nothing) where T <: NCFModel

Predicts ranks for movies present in the test set of user with user_id and calculates 4 different metrics.

Arguments

m<:NCFModel: The learned model.
user_id::Int: Our user's id.
df_test::DataFrame: The whole test set.
top_n_mrr: Int or nothing. Number of top predictions to be considered.

Returns

NamedTuple with fields: {ExtRR, RR, AP, ACC}, representing 4 different metrics.

Example

julia> evaluate_model_on_1_user(model, 1, df_test, top_n_mrr=5);

source

NeuralCollaborativeFiltering.evaluate_model — Function

evaluatemodel(testdf, m::T; minimalylength=10, topnmap=5) where T <: NCFModel

In contrast to evaluate_model_on_1_user(...), calculates metrics on every valid user and averages them by the total number of valid users.

Arguments

m<:NCFModel: The learned model.
test_df: The whole test set in a DataFrame.
minimal_y_length: Minimum number of test instances for a user to be counted. E.g. if 10, then all users with the number of ranked movies under 10 will be skipped.
top_n_mrr: Int or nothing. Number of top predictions to be considered.

Returns

NamedTuple with fields: {MeanExtRR, MRR, MAP, MeanACC}, representing 4 different metrics averaged by the total number of valid users.

Example

julia> evaluate_model(df_test, model);

source