Ivan Shcheklein
Ivan Shcheklein
## Action Items - [ ] We should check why read-only iterator locks the data for other data chains (it fails on creating tables for UDF for example, or removing...
I was trying to convert a SEC 10-K (PDF) as an examples. Running it like this: ```shell python -m markitdown ~/Downloads/4.General\ Electric\ Company.pdf > ge.md ``` And I see that...
When path is a "directory" (prefix) we might end up going into full listing operation to just get an info for a single path. We call `ls` and we don't...
```python import datachain as dc from datachain import File from typing import Tuple, Optional def try_map(file: File) -> Tuple[Optional[File], int, str]: return None, 0, "ok" ( dc.read_storage("") .limit(10) .map(try_map, output={"eeg_data":...
We have multiple clusters support in Studio, `jobs run` should expose the cluster ID.
I'm getting something like: ``` + sentry-sdk==2.22.0 + setuptools==80.1.0 + shellingham==1.5.4 + shortuuid==1.0.13 + shtab==1.7.2 + six==1.17.0 + smmap==5.0.2 + sqlalchemy==2.0.40 + sqltrie==0.11.2 + tabulate==0.9.0 + threadpoolctl==3.6.0 + tomlkit==0.13.2 +...
Now we are getting an error that looks like: `datachain.lib.signal_schema.SetupError: cannot setup value 'client': value must be function or callable class` This is confusing and requires a non obvious workaround,...
This doesn't work: ```python read_storage("...") .filter(file_stem("file.path") == "file.parquet") .save("index") ```
More context / discussion is [here](https://iterativeai.slack.com/archives/C04A9RWEZBN/p1746986037772129) ```python from typing import Optional from dotenv import load_dotenv from datachain import read_storage, C, File load_dotenv(".env.test") def process_events(file: File) -> Optional[File]: return file (...
https://github.com/user-attachments/assets/487d420d-febc-4c9f-8a7d-43af3aef38c8 https://github.com/user-attachments/assets/18ab7fb7-d157-407c-8be5-71c10a97d50c