RecBole icon indicating copy to clipboard operation
RecBole copied to clipboard

Atomic File Creation Probelm

Open KianaLia opened this issue 2 years ago • 4 comments

Hi guys! I'm trying to make a valid Atomic dataset file but I've got some probelms!

My original dataset is a .csv file containing two clomuns: Bid and NumberOfPages (It's a sample file for testing) I load this file a Pandas Dataframe in my code and save it as a .txt file with the code below:

np.savetxt(r'/content/drive/MyDrive/goldoon_data/bd.txt', df, header=''.join(f'{col},' for col in df.columns).rstrip())

The result looks like this: Screenshot from 2022-08-01 13-06-17

And I rename the files format from rectest.txt to rectest.item in the specified data path. Then I try to make a dataset using the following code: config_dict = { 'field_separator' : ',', 'seq_separator' : ' ', 'neg_sampling' : {'uniform': 1}, 'data_path': '/content/drive/MyDrive/', 'load_col': {'item': ['bid','NumberOfPages']}, 'ITEM_ID_FIELD': 'bid', 'save_dataset': True, 'save_dataloaders': True } config = Config(model='BPR', dataset = 'rectest', config_dict= config_dict) dataset = create_dataset(config)

But I get this error: Screenshot from 2022-08-01 13-23-26

Can you help me with it? or do you know a better way to make custom Atomic files?

KianaLia avatar Aug 01 '22 08:08 KianaLia

@KianaLia Hello, thanks for your attention to RecBole! This is because the wrong format of the file. Please ensure that the documents are strictly structured. First, you should remove the , at the end of first line. Second, the remaining lines should be separated by commas (',').

Ethan-TZ avatar Aug 01 '22 10:08 Ethan-TZ

Hi again! @chenyuwuxin Can you help me with creating an Atomic File with the format you mentioned above from a Pandas DataFrame? Here's an example of my dataset:

df = pd.DataFrame({'NumberOfPages:float': {0: 96.0, 1: 96.0, 2: 144.0}, 'bid:token': {0: 3, 1: 3, 2: 5}})

I've shared my tries in the link below: https://stackoverflow.com/questions/73193618/prevent-newline-rule-to-apply-on-header-np-savetxt

KianaLia avatar Aug 04 '22 15:08 KianaLia

@KianaLia For a DataFrame object of your example, you can try the following command to create an Atmmic File: df.to_csv('./test.txt', sep='\t', index=False)

Ethan-TZ avatar Aug 05 '22 01:08 Ethan-TZ

Thanks for your easy solution @chenyuwuxin But when I feed the .txt file into the create_dataset() command I get the following error: image

Here's my config dict:

config_dict = { 'seq_separator' : '\t', 'neg_sampling' : {'uniform': 1}, 'data_path': '/content/drive/MyDrive/', 'load_col': {'item': ['bid','NumberOfPages']}, 'ITEM_ID_FIELD': 'bid', 'save_dataset': True, 'save_dataloaders': True }

KianaLia avatar Aug 06 '22 09:08 KianaLia

Hello @KianaLia,

I don't quite understand the specific meaning of the two columns in your original dataset. In our framework, the .inter file containing user and item columns must be loaded, and the USER_ID_FIELD and ITEM_ID_FIELD must be specified. In your configuration, only the item attribute is loaded, so an error will be reported.

Please clarify whether your question is applicable to the recommendation scenario, and refer to the section on atomic files in our documentation. Thanks for your attention to RecBole!

Sherry-XLL avatar Feb 09 '23 09:02 Sherry-XLL