[Bug] Web2Parquet DataAccessLocal Update needed
Search before asking
- [x] I searched the issues and found no similar issues.
Component
transforms/web2parquet
What happened + What you expected to happen
In web2parquet --> transform.py....this section ############################################################################# ## The same transform can also be used to store crawled files to local folder if self.folder: dao=DataAccessLocal(local_config={'output_folder':self.folder,'input_folder':'.'}) for x in self.docs: dao.save_file(self.folder+'/'+x['filename'], x['contents'])
Since DataAccessLocal function no moe used local_config and has been updated to config....
this code shd be modified accordingly.
Reproduction script
from dpk_web2parquet.transform import Web2Parquet from utils.config import CONFIG import os
Web2Parquet( urls=['https://thealliance.ai/'], folder='dpk_input', depth=1, downloads=1, mime_types=["text/html"] ).transform()
print("Web crawl completed. Downloaded %d files into '%s'" % (len(os.listdir(CONFIG.INPUT_DIR)), CONFIG.INPUT_DIR))
Anything else
No response
OS
MacOS
Python
3.12
Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
hi @rajeshsirsikar-bq Thanks for reporting this issue. Yes. It seems you are right. https://github.com/data-prep-kit/data-prep-kit/blob/80dbab8830ca7ac7ab62131df1dba1ad487df97e/data-processing-lib/python/src/data_processing/data_access/data_access_local.py#L34 https://github.com/data-prep-kit/data-prep-kit/blob/80dbab8830ca7ac7ab62131df1dba1ad487df97e/transforms/universal/web2parquet/dpk_web2parquet/transform.py#L110-L114 hi @shahrokhDaijavad Could you please assign this issue to me. Thanks
@touma-I and @swith005, what do you think?