hub Bug: tfds.load split error

Bug: tfds.load split error

Open Answerious opened this issue 1 year ago • 1 comments

What happened?

trying to follow the colab looking at developing a recommender system test for my company as one of my intern ship projects. When importing movielens 100k data set the issue seems to be stemming from the split parameter for the load function.

Relevant code

import os
import pprint
import tempfile

from typing import Dict, Text

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

import tensorflow_recommenders as tfrs

# ratings of data

ratings = tfds.load("movielens/100k-ratings", split="train")

# features of all the available movies in dataset

movies = tfds.load("movielens/100k-movies", split="train")

Relevant log output

PS C:\Users\jleroux\Desktop\ML NASH> & C:/Users/jleroux/AppData/Local/Programs/Python/Python312/python.exe "c:/Users/jleroux/Desktop/ML NASH/recommender/retrieval.py"
2024-06-10 22:04:15.411232: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-10 22:04:16.232990: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-10 22:04:19.547211: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-10 22:04:19.673378: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be 
discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
{'bucketized_user_age': 45.0,
 'movie_genres': array([7], dtype=int64),
 'movie_id': b'357',
 'movie_title': b"One Flew Over the Cuckoo's Nest (1975)",
 'raw_user_age': 46.0,
 'timestamp': 879024327,
 'user_gender': True,
 'user_id': b'138',
 'user_occupation_label': 4,
 'user_occupation_text': b'doctor',
 'user_rating': 4.0,
 'user_zip_code': b'53211'}
2024-06-10 22:04:19.675671: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-10 22:04:19.714261: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be 
discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
{'movie_genres': array([4], dtype=int64),
 'movie_id': b'1681',
 'movie_title': b'You So Crazy (1994)'}
2024-06-10 22:04:19.715303: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence

tensorflow_hub Version

0.13.0.dev (unstable development build)

TensorFlow Version

2.8 (latest stable release)

Other libraries

No response

Python Version

3.x

OS

Windows

Jun 11 '24 04:06 Answerious

Hi @Answerious , I apologize for the delayed response. It seems that you're using Python version 3.12, while Google Colab by default uses Python 3.11. To ensure compatibility, please downgrade your Python version to 3.11. Additionally, Google Colab currently uses TensorFlow Hub version 0.16.1. By adjusting these versions, the code should work correctly. Please refer to this gist for further details

Thank you

Mar 20 '25 06:03 malla456

hub hub copied to clipboard

Bug: tfds.load split error

What happened?

Relevant code

Relevant log output

tensorflow_hub Version

TensorFlow Version

Other libraries

Python Version

OS

hub
hub copied to clipboard