dual_encoding icon indicating copy to clipboard operation
dual_encoding copied to clipboard

Issue about dataset format

Open albat3ross opened this issue 5 years ago • 2 comments

Hello, As we were trying to re-implement the model onto other datasets, we get stuck at the generation feature.bin file. Your team has mentioned that we could use txt2bin.py to convert the feature files from txt into binary format, but I'm not sure what should the feature files looks like when it is in .txt form. Can you provide a few lines of example for the txt feature files? It would be great if there're some example files for reference. Thank you for your help!

albat3ross avatar Oct 16 '20 02:10 albat3ross

Please refer to here. The format of each line is an id followed by a feature vector. ps: We have already released our feature extraction code.

danieljf24 avatar Oct 16 '20 04:10 danieljf24

Thank you for the example! It would be very helpful for us.

albat3ross avatar Oct 16 '20 21:10 albat3ross