tango
tango copied to clipboard
Dictionary as paramters in config files
🐛 Describe the bug
Dictionaries are parsed as Param and thus sometimes prevent dictionary-type parameters cannot be used in config files. Example for datasets::load
step as follows:
read_data: {
type: "datasets::load",
path: "json",
data_files: {
train: "train.jsonl",
dev: "dev.jsonl"
},
},
data_files
parameter cannot be used in this case since dictionaries in config files are Param by default. Can be fixed by using as_dict
function to convert Param:
def run(self, path: str, **kwargs) -> Union[ds.DatasetDict, ds.Dataset]: # type: ignore
"""
Load the HuggingFace dataset specified by ``path``.
``path`` is the canonical name or path to the dataset. Additional key word arguments
are passed as-is to :func:`datasets.load_dataset()`.
"""
kwargs = {x: y.as_dict() for x, y in kwargs.items()}
dataset = ds.load_dataset(path, **kwargs)
Versions
Python 3.8.13 absl-py==1.2.0 accelerate==0.12.0 ai2-tango==0.14.0 aiohttp==3.8.3 aiohttp-cors==0.7.0 aiosignal==1.2.0 antlr4-python3-runtime==4.9.3 anyio==3.6.1 argon2-cffi @ file:///opt/conda/conda-bld/argon2-cffi_1645000214183/work argon2-cffi-bindings @ file:///tmp/build/80754af9/argon2-cffi-bindings_1644569684262/work asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work astunparse==1.6.3 async-timeout==4.0.2 attrs==22.1.0 autoflake==1.6.1 Babel @ file:///tmp/build/80754af9/babel_1620871417480/work backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work base58==2.1.1 beaker-py==1.10.3 beautifulsoup4==4.11.1 black==22.10.0 bleach @ file:///opt/conda/conda-bld/bleach_1641577558959/work blessed==1.19.1 blingfire==0.1.8 blis==0.7.8 boto3==1.24.59 botocore==1.27.59 bpemb==0.3.4 brotlipy==0.7.0 bs4==0.0.1 cached-path==1.1.6 cachetools==5.2.0 catalogue==2.0.8 certifi==2022.9.24 cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work charset-normalizer==2.1.1 chex==0.1.5 click==8.0.4 click-help-colors==0.9.1 cloudpickle==2.2.0 colorama==0.4.5 coloredlogs==15.0.1 colorful==0.5.4 commonmark==0.9.1 confection==0.0.3 conllu==4.5.2 contourpy==1.0.5 cryptography @ file:///tmp/build/80754af9/cryptography_1652083738073/work cupy-cuda113==10.6.0 cycler==0.11.0 cymem==2.0.6 Cython==0.29.32 datasets==2.5.2 debugpy @ file:///tmp/build/80754af9/debugpy_1637091796427/work decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work Deprecated==1.2.13 dill==0.3.5.1 distlib==0.3.6 dm-tree==0.1.7 docker==6.0.0 docker-pycreds==0.4.0 elasticsearch==7.17.6 elasticsearch-dsl==7.4.0 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl entrypoints @ file:///tmp/build/80754af9/entrypoints_1649926445639/work etils==0.8.0 evaluate==0.2.2 executing @ file:///opt/conda/conda-bld/executing_1646925071911/work fairscale==0.4.9 faiss==1.7.2 fastjsonschema @ file:///opt/conda/conda-bld/python-fastjsonschema_1661371079312/work fastrlock==0.8 filelock==3.8.0 flair==0.11.3 flatbuffers==22.9.24 flax==0.5.0 fonttools==4.37.4 frozenlist==1.3.1 fsspec==2022.8.2 ftfy==6.1.1 future==0.18.2 gast==0.4.0 gdown==4.4.0 gensim==4.2.0 gevent==22.8.0 gitdb==4.0.9 GitPython==3.1.28 glob2==0.7 google-api-core==2.8.2 google-auth==2.12.0 google-auth-oauthlib==0.4.6 google-cloud-core==2.3.2 google-cloud-storage==2.5.0 google-crc32c==1.5.0 google-pasta==0.2.0 google-resumable-media==2.4.0 googleapis-common-protos==1.56.4 gpustat==1.0.0 greenlet==1.1.3.post0 grpcio==1.43.0 h5py==3.7.0 huggingface-hub==0.9.1 humanfriendly==10.0 hydra-core==1.2.0 hyperopt==0.2.7 idna==3.4 importlib-metadata==4.11.3 importlib-resources==5.9.0 inflect==6.0.0 iniconfig==1.1.1 ipykernel @ file:///tmp/build/80754af9/ipykernel_1646982648551/work/dist/ipykernel-6.9.1-py3-none-any.whl ipython @ file:///opt/conda/conda-bld/ipython_1657652213665/work ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work Janome==0.4.2 jaro-winkler==2.0.3 jax==0.3.20 jaxlib==0.3.20 jedi @ file:///tmp/build/80754af9/jedi_1644315233700/work Jinja2==3.1.2 jmespath==1.0.1 joblib==1.2.0 json5 @ file:///tmp/build/80754af9/json5_1624432770122/work jsonlines==3.1.0 jsonnet==0.18.0 jsonschema==4.16.0 jupyter-core @ file:///opt/conda/conda-bld/jupyter_core_1651671229925/work jupyter-server @ file:///tmp/abs_b88b31b8-83b9-476d-a46d-e563c421f38fvsnyi1ur/croots/recipe/jupyter_server_1658754481507/work jupyter_client @ file:///opt/conda/conda-bld/jupyter_client_1661848916004/work jupyterlab @ file:///tmp/abs_12f3h01vmy/croots/recipe/jupyterlab_1658907535764/work jupyterlab-pygments @ file:///tmp/build/80754af9/jupyterlab_pygments_1601490720602/work jupyterlab_server @ file:///opt/conda/conda-bld/jupyterlab_server_1664908231281/work keras==2.10.0 Keras-Preprocessing==1.1.2 kilt @ git+https://github.com/facebookresearch/KILT.git@2664322b9a994be686e4c3a9e8f75a0b70927f22 kiwisolver==1.4.4 konoha==4.6.5 langcodes==3.3.0 langdetect==1.0.9 libclang==14.0.6 lightgbm==3.3.2 loguru==0.6.0 lxml==4.9.1 Markdown==3.3.4 MarkupSafe==2.1.1 matplotlib==3.6.0 matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work mistune==0.8.4 mkl-fft==1.3.1 mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186064646/work mkl-service==2.4.0 more-itertools==8.14.0 mpld3==0.3 mpmath==1.2.1 msgpack==1.0.4 multidict==6.0.2 multiprocess==0.70.13 murmurhash==1.0.8 mypy-extensions==0.4.3 nbclassic @ file:///opt/conda/conda-bld/nbclassic_1644943264176/work nbclient @ file:///tmp/build/80754af9/nbclient_1650308366712/work nbconvert @ file:///opt/conda/conda-bld/nbconvert_1649751911790/work nbformat @ file:///opt/conda/conda-bld/nbformat_1663744952973/work nest-asyncio @ file:///tmp/build/80754af9/nest-asyncio_1649847908682/work networkx==2.8.7 nltk==3.7 nmslib==2.1.1 notebook @ file:///tmp/abs_abf6xa6h6f/croots/recipe/notebook_1659083654985/work numpy==1.23.2 nvidia-ml-py==11.495.46 oauthlib==3.2.1 omegaconf==2.2.3 onnxruntime==1.12.1 opencensus==0.11.0 opencensus-context==0.1.3 opt-einsum==3.3.0 optax==0.1.3 overrides==3.1.0 packaging==21.3 pandas==1.5.0 pandocfilters @ file:///opt/conda/conda-bld/pandocfilters_1643405455980/work parso @ file:///opt/conda/conda-bld/parso_1641458642106/work pastel==0.2.1 pathspec==0.10.1 pathtools==0.1.2 pathy==0.6.2 petname==2.6 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==9.2.0 pkgutil_resolve_name==1.3.10 platformdirs==2.5.2 pluggy==1.0.0 poethepoet==0.16.2 polars==0.13.62 portalocker==2.5.1 pptree==3.1 preshed==3.0.7 prettytable==3.4.1 prometheus-client @ file:///tmp/abs_d3zeliano1/croots/recipe/prometheus_client_1659455100375/work promise==2.3 prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1633440160888/work protobuf==3.19.4 psutil==5.9.2 ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work py==1.11.0 py-spy==0.3.14 py4j==0.10.9.7 pyarrow==9.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pydantic==1.9.2 pyDeprecate==0.3.2 pyflakes==2.5.0 Pygments==2.13.0 pyjnius==1.4.2 pymongo==4.2.0 pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work pyparsing==3.0.9 pyrootutils==1.0.4 pyrsistent==0.18.1 pyserini==0.17.1 PySocks==1.7.1 pytest==7.1.3 python-dateutil==2.8.2 python-dotenv==0.21.0 pytorch-lightning==1.7.7 pytz==2022.4 PyYAML==6.0 pyzmq @ file:///opt/conda/conda-bld/pyzmq_1657724186960/work ray==1.13.0 -e git+ssh://[email protected]/minhptx/read.git@983be8afad8e26ecb1c38a9c05eea88c77001802#egg=read regex==2022.9.13 requests==2.28.1 requests-oauthlib==1.3.1 responses==0.18.0 rich==12.6.0 rouge==1.0.1 rsa==4.9 s3transfer==0.6.0 sacrebleu==2.2.1 sacremoses==0.0.53 sarge==0.1.7.post1 scikit-learn==1.1.2 scipy==1.6.1 seaborn==0.12.0 segtok==1.5.11 Send2Trash @ file:///tmp/build/80754af9/send2trash_1632406701022/work sentence-transformers==2.2.2 sentencepiece==0.1.95 sentry-sdk==1.9.10 setproctitle==1.3.2 setuptools-scm==7.0.5 shortuuid==1.0.9 six==1.16.0 smart-open==5.2.1 smmap==5.0.0 sniffio==1.3.0 soupsieve==2.3.2.post1 spacy==3.4.1 spacy-legacy==3.0.10 spacy-loggers==1.0.3 sqlitedict==2.0.0 srsly==2.4.4 stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work sympy==1.11.1 tabulate==0.9.0 tensorboard==2.10.1 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow-cpu==2.10.0 tensorflow-estimator==2.10.0 tensorflow-io-gcs-filesystem==0.27.0 termcolor==2.0.1 terminado @ file:///tmp/build/80754af9/terminado_1644322581811/work testpath @ file:///opt/conda/conda-bld/testpath_1655908557405/work thinc==8.1.3 threadpoolctl==3.1.0 tokenizers==0.12.1 tomli==2.0.1 toolz==0.12.0 torch @ https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp38-cp38-linux_x86_64.whl torch-scatter==2.0.9 torchmetrics==0.10.0 torchvision @ https://download.pytorch.org/whl/cu113/torchvision-0.13.1%2Bcu113-cp38-cp38-linux_x86_64.whl tornado @ file:///opt/conda/conda-bld/tornado_1662061693373/work tqdm==4.64.1 traitlets==5.4.0 transformers==4.22.2 trectools==0.0.49 typer==0.4.2 typing_extensions==4.4.0 ujson==5.5.0 Unidecode==1.3.6 urllib3==1.26.12 virtualenv==20.16.5 wandb==0.13.4 wasabi==0.10.1 wcwidth==0.2.5 webencodings==0.5.1 websocket==0.2.1 websocket-client==0.58.0 Werkzeug==2.2.2 Wikipedia-API==0.5.4 wikitextparser==0.51.0 wrapt==1.14.1 xxhash==3.0.0 yarl==1.8.1 zipp==3.8.1 zope.event==4.5.0 zope.interface==5.5.0