course icon indicating copy to clipboard operation
course copied to clipboard

Broken Link: Chapter 5.4 Big Data

Open hrh-bbc-rd opened this issue 2 years ago • 4 comments

The link to the PubMed Abstracts Database is broken in the Chapter 5 Section 4 'Big Datasets Chapter'.

Broken link in question found in

data_files = "https://the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst"

Chapter here

hrh-bbc-rd avatar Jul 17 '23 11:07 hrh-bbc-rd

I have been able to continue doing the course by using this link instead

data_files = "https://the-eye.eu/public/AI/pile_v2/data/NIH_ExPORTER_awarded_grant_text.jsonl.zst"

hrh-bbc-rd avatar Jul 17 '23 11:07 hrh-bbc-rd

Looks like this URL changing and breaking the link has been an issue before (see #324)

tj-cahill avatar Jul 28 '23 22:07 tj-cahill

Note that there is another broken link further down the page on this line in the following code block:

law_dataset_streamed = load_dataset(
    "json",
    data_files="https://the-eye.eu/public/AI/pile_preliminary_components/FreeLaw_Opinions.jsonl.zst",
    split="train",
    streaming=True,
)
next(iter(law_dataset_streamed))

tj-cahill avatar Jul 29 '23 18:07 tj-cahill

Same issue here, looks like the pile has been taken down due to copyright reasons.

Dboee avatar Jan 24 '24 11:01 Dboee