promptflow icon indicating copy to clipboard operation
promptflow copied to clipboard

[BUG] pf run create - generates snapshot with all files in project

Open sashokbg opened this issue 1 year ago • 7 comments

Describe the bug In my project I have a couple of very big directories such as .venv and models. When running pf run create, it generates a snapshot that contains the entire dir tree of my project wasting huge amounts of disk space.

Expected behavior Maybe add an ignorefile like .gitignore ?

Screenshots

du -d 1 ~/.promptflow/.runs/first_run -h       
240K	/home/alexander/.promptflow/.runs/first_run/node_artifacts
25G	/home/alexander/.promptflow/.runs/first_run/snapshot
96K	/home/alexander/.promptflow/.runs/first_run/flow_artifacts
12K	/home/alexander/.promptflow/.runs/first_run/flow_outputs
25G	/home/alexander/.promptflow/.runs/first_run

Running Information(please complete the following information): promptflow 1.6.0

Executable '/home/alexander/Games2/degiro-faq-assistant/.venv/bin/python' Python (Linux) 3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801]

Linux alexander-desktop 6.6.16-2-MANJARO #1 SMP PREEMPT_DYNAMIC Sat Feb 10 09:40:02 UTC 2024 x86_64 GNU/Linux

sashokbg avatar Mar 20 '24 22:03 sashokbg

Hi @sashokbg , thanks for reporting this to us.

I think we should make two enhancement on this:

  • set a default limit to avoid creating snapshot so large
  • hornor .gitignore file in user folder.

We will let you know when we have this worked out for you. For workaround, is it possible to move those big directories out of the flow folder?

wangchao1230 avatar Mar 21 '24 02:03 wangchao1230

Hello @wangchao1230 thank you for your prompt reply. Yes I can manage a workaround for now :)

Have a nice day !

sashokbg avatar Mar 21 '24 12:03 sashokbg

@wangchao1230 can you point me at where to look in the code please ? I might have enough time to try to add the gitignore part

sashokbg avatar Mar 21 '24 16:03 sashokbg

Our exisitng logic is like this: https://github.com/microsoft/promptflow/blob/0ae6a0aa36dd900fcfdd67b1a868a1d88c6fb964/src/promptflow/promptflow/_sdk/_utils.py#L449

wangchao1230 avatar Mar 22 '24 00:03 wangchao1230

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

github-actions[bot] avatar Apr 21 '24 21:04 github-actions[bot]

Hello @wangchao1230 so there is already ignore logic but it seems it is not working ?

sashokbg avatar Apr 23 '24 08:04 sashokbg

Yes. Current implementation might has some limitation like only looking at ignore file in code folder, not search up to parent folders.

wangchao1230 avatar Apr 23 '24 10:04 wangchao1230

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

github-actions[bot] avatar May 23 '24 21:05 github-actions[bot]