DataChad icon indicating copy to clipboard operation
DataChad copied to clipboard

ERROR: "Dataset does not exist at the specified path..."

Open jimjones26 opened this issue 1 year ago • 4 comments

I am getting the following error anytime I start the app locally, or once it is running locally and I try to upload my own documents.

Failed to build chain for data source 'https://github.com/gustavz/DataChad.git' with error: A dataset does not exist at the specified path, or you do not have sufficient permissions to load or create one. Please check the dataset path and make sure that you have sufficient permissions to the path.

I am guessing this has to do with my activeloop account, but I am unable to figure out where to enable permissions for datasets to be created in activeloop. Am I missing something obvious?

jimjones26 avatar May 23 '23 23:05 jimjones26

Try following:

  1. Does it work without providing your activeloop credential, so using our database ? -> if this works its related to your activeloop account
  2. Double check your credentials before submitting them
  3. Delete any datasets that datachad created in your account, they may be broken
  4. retry

let me know if this helped

gustavz avatar May 24 '23 07:05 gustavz

ok so I wanted to use Deep Lake online. I had a similar initialization issue... I looked around the code and decided to add some (unrelated) code to initialize a Deep Lake dataset with the same name I used DEFAULT_DATA_SOURCE = "brain22" but initialized (in the separate code) as hub://"org"/brain22-1000-0

This let me get to the point where now I could actually upload a directory of pdf files....

But in the process, it created a completely separate DeepLake dataset ....

and then for each additional pdf I would add it would create yet another dataset instead of adding to the previously created ...

Hope it helps

cnndabbler avatar May 25 '23 22:05 cnndabbler

It is intended to create a dataset per datasource. The app always chats with a single data source, like a pdf or a GitHub repo, to enable asking dedicated questions.

But I see the potential use case here, will add the option to store everything to a single dataset.

gustavz avatar May 26 '23 05:05 gustavz

@jimjones26, in constants.py, delete ".git" from the DEFAULT_DATA_SOURCE

fjsikora avatar Jun 06 '23 15:06 fjsikora