Made-With-ML
Made-With-ML copied to clipboard
OSError: [Errno 30] Cannot create directory '/efs'. Detail: [errno 30] Read-only file system
@GokuMohandas can you help me figure this out
same error here, have you managed to resolve it?
I am having the same issue and no idea why. basically it is unable to load function from madewilml/data directory. A hack that worked for me is to create and run the following code cell above this erroneous code cell
import re
from typing import Dict, List, Tuple
import numpy as np
import pandas as pd
import ray
from ray.data import Dataset
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer
def stratify_split(
ds: Dataset,
stratify: str,
test_size: float,
shuffle: bool = True,
seed: int = 1234,
) -> Tuple[Dataset, Dataset]:
"""Split a dataset into train and test splits with equal
amounts of data points from each class in the column we
want to stratify on.
Args:
ds (Dataset): Input dataset to split.
stratify (str): Name of column to split on.
test_size (float): Proportion of dataset to split for test set.
shuffle (bool, optional): whether to shuffle the dataset. Defaults to True.
seed (int, optional): seed for shuffling. Defaults to 1234.
Returns:
Tuple[Dataset, Dataset]: the stratified train and test datasets.
"""
def _add_split(df: pd.DataFrame) -> pd.DataFrame: # pragma: no cover, used in parent function
"""Naively split a dataframe into train and test splits.
Add a column specifying whether it's the train or test split."""
train, test = train_test_split(df, test_size=test_size, shuffle=shuffle, random_state=seed)
train["_split"] = "train"
test["_split"] = "test"
return pd.concat([train, test])
def _filter_split(df: pd.DataFrame, split: str) -> pd.DataFrame: # pragma: no cover, used in parent function
"""Filter by data points that match the split column's value
and return the dataframe with the _split column dropped."""
return df[df["_split"] == split].drop("_split", axis=1)
# Train, test split with stratify
grouped = ds.groupby(stratify).map_groups(_add_split, batch_format="pandas") # group by each unique value in the column we want to stratify on
train_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "train"}, batch_format="pandas") # combine
test_ds = grouped.map_batches(_filter_split, fn_kwargs={"split": "test"}, batch_format="pandas") # combine
# Shuffle each split (required)
train_ds = train_ds.random_shuffle(seed=seed)
test_ds = test_ds.random_shuffle(seed=seed)
return train_ds, test_ds
Basically instead of importing it which it is failing to do so (no idea why) we are directly using the function in the notebook
But the same error would come in training, check this repo https://github.com/GokuMohandas/mlops-course
As the error message indicated, this error caused by the permission related to /efs
folder, you are creating.
I assume you use your own local machine. I edited like below, and it worked in my local environment, Mac OS (14.1.2) and Python 3.10.11. The path would be different, depending on where your directory located. I hope this might help you.
-
config.py
Change line 13:EFS_DIR = Path(f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ.get('GITHUB_USERNAME', '')}")
-
madewithml.ipynb
Change the codes in Setup section:EFS_DIR = f"/Users/<your_user_name>/efs/shared_storage/madewithml/{os.environ['GITHUB_USERNAME']}"