Vladimir Rudnykh

Results 22 issues of Vladimir Rudnykh

Follow-up for the https://github.com/iterative/datachain/issues/1031 + https://github.com/iterative/datachain/pull/1035 We need to implement proper support for [Pandas MultiIndex](https://pandas.pydata.org/docs/user_guide/advanced.html#multiindex-advanced-indexing), we should be able to read back multiindex pandas and properly reconstruct the datachain. See...

It would be great if we can parallelize video files processing (splitting into frames/fragments). At least uploading files. Example UDF: ```python def get_frames(file: VideoFile) -> Iterator[tuple[VideoFrame, ImageFile]]: for frame in...

enhancement

We do not put [uploaded files](https://github.com/iterative/datachain/blob/ce387858aa3abe39716c2fc1a449ba86c4e28cc9/src/datachain/lib/file.py#L276-L295) in cache. We should probably have a method like `cache` or `cache=True` flag in save.

[Example script](https://github.com/iterative/datachain/blob/main/examples/computer_vision/ultralytics-bbox.py): ```python from ultralytics import YOLO from datachain import C, DataChain, File from datachain.model.ultralytics import YoloBBoxes def process_bboxes(yolo: YOLO, file: File) -> YoloBBoxes: results = yolo(file.as_image_file().read(), verbose=False) return YoloBBoxes.from_results(results)...

bug

In https://github.com/iterative/datachain/pull/927 we have new `to_partial` method introduced in `SignalSchema` class. It uses signal schema serialization -> deserialization to create partial schema, and it looks like not perfect solution. Let's...

housekeeping

In `datachain.toolkit` we do have `train_test_split` function for splitting a DataChain into multiple subsets. See docs [here](https://docs.datachain.ai/references/toolkit/#datachain.toolkit.train_test_split). We need to be able to specify `groups` spec and ensure that those...

enhancement

This is a follow-up issue for the https://github.com/iterative/datachain/issues/228. We still have implement the following group_by functions: - [ ] first - [ ] last - [ ] std - [...

enhancement

This is a follow-up issue for the https://github.com/iterative/datachain/issues/227. We need to check and implement missing window-functions, see [this comment](https://github.com/iterative/datachain/pull/515#discussion_r1806870655). The list of functions that need to be implemented will be...

enhancement

In https://github.com/iterative/datachain/pull/515 we have [window-functions](https://github.com/iterative/datachain/issues/227) implemented. However some common use cases still requires a lot of works, for example, selecting a subset of the records with N records in each...

enhancement

For now in few cases datasets tables have no `sys` columns (`sys__id` and `sys__rand`), for example, aggregation result dataset by default comes without these tables. We need to: 1. Check...