hub
hub copied to clipboard
Bug: tfds.load split error
What happened?
trying to follow the colab looking at developing a recommender system test for my company as one of my intern ship projects. When importing movielens 100k data set the issue seems to be stemming from the split parameter for the load function.
Relevant code
import os
import pprint
import tempfile
from typing import Dict, Text
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
# ratings of data
ratings = tfds.load("movielens/100k-ratings", split="train")
# features of all the available movies in dataset
movies = tfds.load("movielens/100k-movies", split="train")
Relevant log output
PS C:\Users\jleroux\Desktop\ML NASH> & C:/Users/jleroux/AppData/Local/Programs/Python/Python312/python.exe "c:/Users/jleroux/Desktop/ML NASH/recommender/retrieval.py"
2024-06-10 22:04:15.411232: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-10 22:04:16.232990: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-10 22:04:19.547211: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-10 22:04:19.673378: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be
discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
{'bucketized_user_age': 45.0,
'movie_genres': array([7], dtype=int64),
'movie_id': b'357',
'movie_title': b"One Flew Over the Cuckoo's Nest (1975)",
'raw_user_age': 46.0,
'timestamp': 879024327,
'user_gender': True,
'user_id': b'138',
'user_occupation_label': 4,
'user_occupation_text': b'doctor',
'user_rating': 4.0,
'user_zip_code': b'53211'}
2024-06-10 22:04:19.675671: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-10 22:04:19.714261: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be
discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
{'movie_genres': array([4], dtype=int64),
'movie_id': b'1681',
'movie_title': b'You So Crazy (1994)'}
2024-06-10 22:04:19.715303: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
tensorflow_hub Version
0.13.0.dev (unstable development build)
TensorFlow Version
2.8 (latest stable release)
Other libraries
No response
Python Version
3.x
OS
Windows
Hi @Answerious ,
I apologize for the delayed response. It seems that you're using Python version 3.12, while Google Colab by default uses Python 3.11. To ensure compatibility, please downgrade your Python version to 3.11. Additionally, Google Colab currently uses TensorFlow Hub version 0.16.1. By adjusting these versions, the code should work correctly. Please refer to this gist for further details
Thank you