instructor-embedding
instructor-embedding copied to clipboard
please accept this pull request ASAP PLEASE!
First change
Apparently during the last few pull requests the capitalization of a few things as well as the "_" character being omitted were made. This prevents it from working. For example...
The class was initially named INSTRUCTOR_Pooling
In a pull request it was changed to lowercase instead, and that "_" was removed in the same pull request or possibly another, I can't remember.
This pull request corrects the most recent pull request and classes are correctly named throughout the script. Here's a picture of the lines changed:
Second change
I corrected it to use the local path of specified...not sure if this is the PERFECT solution since the sentence-transformers library has a util.py script that does something similar, but THIS WORKS after many hours. Please merge:
@BBC-Esq did you try to use the load_dir_path function from https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L552-L554 ?
Because it's been there from version 2.3.0 and looks exactly like what is needed here.
Now I understand the problem, the files https://huggingface.co/hkunlp/instructor-large/blob/main/config_sentence_transformers.json etc. need to be downloaded and the https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L559 does not let you match files in the root directory.
I made PR https://github.com/BBC-Esq/instructor-embedding/pull/1 which mimicks the logic of load_dir_path while working well with instructor. I managed to make it run for both huggingface and local models.
@BBC-Esq did you try to use the
load_dir_pathfunction from https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L552-L554 ? Because it's been there from version 2.3.0 and looks exactly like what is needed here.
Couldn't figure out how to since I'm not a programmer by trade.
I made PR BBC-Esq#1 which mimicks the logic of
load_dir_pathwhile working well with instructor. I managed to make it run for both huggingface and local models.
Is this different than my pull request?
It's a pull request to your pull request, extending the logic you added.
looking at the changes you made, if you want to name the classes as they were in the released version in https://pypi.org/project/InstructorEmbedding/1.0.1/ which means most probably the commit https://github.com/xlang-ai/instructor-embedding/blob/619dda2bfbab22b3e623cb5893472eacae19f4a1/InstructorEmbedding/instructor.py then the name should be INSTRUCTOR_Pooling, not INSTRUCTORPooling, see https://github.com/xlang-ai/instructor-embedding/blob/619dda2bfbab22b3e623cb5893472eacae19f4a1/InstructorEmbedding/instructor.py#L23
and just for completeness: the change which renamed these classes, which I think is quite big change and should not have been merged, is https://github.com/xlang-ai/instructor-embedding/pull/92
classes should not be renamed such easily without some larger discussion I think.
And that change to INSTRUCTOR_Pooling is in my PR to this PR.
Please take a few minutes to review this pull request and address @racinmat 's questions and modifications as well. If you can't find the time to do this, please transfer ownership of the repo to someone who has the motivation to do the bare minimum in maintaining it and publishing a new package to pypi. Thanks.
@hongjin-su could you please merge this PR? or #115 ?
Now I understand the problem, the files
https://huggingface.co/hkunlp/instructor-large/blob/main/config_sentence_transformers.jsonetc. need to be downloaded and the https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L559 does not let you match files in the root directory.
I heavily changed your pull request to my fork before accepting it...Essentially removing that part you're discussing here. I didn't quite understand the need, but if you feel strongly it's possible I was mistaken as this isn't my profession. Let me know!