RecBole
RecBole copied to clipboard
Atomic File Creation Probelm
Hi guys! I'm trying to make a valid Atomic dataset file but I've got some probelms!
My original dataset is a .csv file containing two clomuns: Bid and NumberOfPages (It's a sample file for testing) I load this file a Pandas Dataframe in my code and save it as a .txt file with the code below:
np.savetxt(r'/content/drive/MyDrive/goldoon_data/bd.txt', df, header=''.join(f'{col},' for col in df.columns).rstrip())
The result looks like this:
And I rename the files format from rectest.txt to rectest.item in the specified data path.
Then I try to make a dataset using the following code:
config_dict = { 'field_separator' : ',', 'seq_separator' : ' ', 'neg_sampling' : {'uniform': 1}, 'data_path': '/content/drive/MyDrive/', 'load_col': {'item': ['bid','NumberOfPages']}, 'ITEM_ID_FIELD': 'bid', 'save_dataset': True, 'save_dataloaders': True }
config = Config(model='BPR', dataset = 'rectest', config_dict= config_dict)
dataset = create_dataset(config)
But I get this error:
Can you help me with it? or do you know a better way to make custom Atomic files?
@KianaLia Hello, thanks for your attention to RecBole! This is because the wrong format of the file. Please ensure that the documents are strictly structured. First, you should remove the , at the end of first line. Second, the remaining lines should be separated by commas (',').
Hi again! @chenyuwuxin Can you help me with creating an Atomic File with the format you mentioned above from a Pandas DataFrame? Here's an example of my dataset:
df = pd.DataFrame({'NumberOfPages:float': {0: 96.0, 1: 96.0, 2: 144.0}, 'bid:token': {0: 3, 1: 3, 2: 5}})
I've shared my tries in the link below: https://stackoverflow.com/questions/73193618/prevent-newline-rule-to-apply-on-header-np-savetxt
@KianaLia For a DataFrame object of your example, you can try the following command to create an Atmmic File:
df.to_csv('./test.txt', sep='\t', index=False)
Thanks for your easy solution @chenyuwuxin
But when I feed the .txt file into the create_dataset() command I get the following error:
Here's my config dict:
config_dict = { 'seq_separator' : '\t', 'neg_sampling' : {'uniform': 1}, 'data_path': '/content/drive/MyDrive/', 'load_col': {'item': ['bid','NumberOfPages']}, 'ITEM_ID_FIELD': 'bid', 'save_dataset': True, 'save_dataloaders': True }
Hello @KianaLia,
I don't quite understand the specific meaning of the two columns in your original dataset. In our framework, the .inter
file containing user and item columns must be loaded, and the USER_ID_FIELD
and ITEM_ID_FIELD
must be specified. In your configuration, only the item attribute is loaded, so an error will be reported.
Please clarify whether your question is applicable to the recommendation scenario, and refer to the section on atomic files in our documentation. Thanks for your attention to RecBole!