coir
coir copied to clipboard
Questions Regarding CoIR Dataset Usage in Code Explanation Retrieval
I’ve been referring to the CoIR paper and codebase—thank you for making this valuable resource available!
I had a question regarding dataset handling in your work.
For datasets retrieved via Hugging Face (like CodeSearchNet), is any preprocessing (e.g., stripping comments from code) applied before retrieval? I couldn't find related scripts in the repo.
I noticed that comments from the code sometimes appear as queries in all splits(train, valid, test). For Example:
qrel: