Search within the Bibles or marginalia:
These searches may take a while to complete, especially for higher values of k.
The search engine for the corpus of Bibles follows the exact logic as my text reuse detection procedure.
My fine-tuned bi-encoder is EEPS_ALL_MacBERTh_Epoch1 and my cross encoder is EEPS_cross-encoder_emanjavacas_MacBERTh/checkpoint-500.
I compute the cosine similarity score, cross encoder score,
and BM25 score between the query passage and the full and partial verses of the top k retrieved full Bible verses.
Both cosine and cross similarity scores are between 0 and 1.
Retrieving documents using a bi-encoder takes much less time than reranking and scoring them with the cross encoder and BM25.
As such, I do not recommend setting k above 25 when searching the Bibles.