screenshot-to-code icon indicating copy to clipboard operation
screenshot-to-code copied to clipboard

Support multiple selection in "Select and Edit"

Open radrad opened this issue 1 year ago • 5 comments

Describe the bug

I am using VS code insider in admin mode.

In backend .env I entered my AI keys: OPENAI_API_KEY=sk-2siLny... ANTHROPIC_API_KEY=sk-ant-api0...

When drag/drop an .mp4 video below: https://github.com/user-attachments/assets/22713a47-4d23-44e9-b83f-dcb774ebbcc8 I am getting a notification dialog: Error assembling prompt. Contact support at [email protected]

How can I use the latest models and what it the code I should change? I want to use the latest OppenAI model: o1-preview which points to o1-preview-2024-09-12 and I want to use the lastes Anthropic model: claude-3-5-sonnet-latest which points to claude-3-5-sonnet-20241022

I am confused where in the code I can designate the latest versions

When I drag/drop .png image: screenshot1 I cannot see Option 1 rendering. What is that Option 1 suppose to show? Open AI based generation?

How can I use the latest o1 model

frontend\src\lib\models.ts `// Keep in sync with backend (llm.py) // Order here matches dropdown order export enum CodeGenerationModel { CLAUDE_3_5_SONNET_2024_06_20 = "claude-3-5-sonnet-20240620", GPT_4O_2024_05_13 = "gpt-4o-2024-05-13", GPT_4_TURBO_2024_04_09 = "gpt-4-turbo-2024-04-09", GPT_4_VISION = "gpt_4_vision", CLAUDE_3_SONNET = "claude_3_sonnet", }

// Will generate a static error if a model in the enum above is not in the descriptions export const CODE_GENERATION_MODEL_DESCRIPTIONS: { [key in CodeGenerationModel]: { name: string; inBeta: boolean }; } = { "gpt-4o-2024-05-13": { name: "GPT-4o", inBeta: false }, "claude-3-5-sonnet-20240620": { name: "Claude 3.5 Sonnet", inBeta: false }, "gpt-4-turbo-2024-04-09": { name: "GPT-4 Turbo (deprecated)", inBeta: false }, gpt_4_vision: { name: "GPT-4 Vision (deprecated)", inBeta: false }, claude_3_sonnet: { name: "Claude 3 (deprecated)", inBeta: false }, };`

Console log for backend:

Using openAiApiKey from client-side settings dialog Using anthropicApiKey from client-side settings dialog Using official OpenAI URL Generating react_tailwind code in video mode using Llm.CLAUDE_3_5_SONNET_2024_06_20... Status (variant 0): Generating code... Status (variant 1): Generating code... C:\Users\Greg\AppData\Local\Temp\tmpkwxwbc8s.mp4 Error assembling prompt. Contact support at [email protected] ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\ffmpeg_reader.py", line 285, in ffmpeg_parse_infos line = [l for l in lines if keyword in l][index] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^ IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\uvicorn\protocols\websockets\websockets_impl.py", line 250, in run_asgi result = await self.app(self.scope, self.asgi_receive, self.asgi_send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 84, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\applications.py", line 276, in call await super().call(scope, receive, send) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\errors.py", line 149, in call await self.app(scope, receive, send) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\cors.py", line 75, in call await self.app(scope, receive, send) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\exceptions.py", line 79, in call raise exc File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\exceptions.py", line 68, in call await self.app(scope, receive, sender) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in call raise e File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\routing.py", line 718, in call await route.handle(scope, receive, send) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\routing.py", line 341, in handle await self.app(scope, receive, send) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\routing.py", line 82, in app await func(session) File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\routing.py", line 289, in app await dependant.call(**values) File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\routes\generate_code.py", line 234, in stream_code
prompt_messages, image_cache = await create_prompt(params, stack, input_mode) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\prompts_init_.py", line 72, in create_prompt
prompt_messages = await assemble_claude_prompt_video(video_data_url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\video\utils.py", line 21, in assemble_claude_prompt_video images = split_video_into_screenshots(video_data_url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\video\utils.py", line 79, in split_video_into_screenshots clip = VideoFileClip(temp_video_file.name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\VideoFileClip.py", line 88, in init self.reader = FFMPEG_VideoReader(filename, pix_fmt=pix_fmt, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\ffmpeg_reader.py", line 35, in init infos = ffmpeg_parse_infos(filename, print_infos, check_duration, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\ffmpeg_reader.py", line 289, in ffmpeg_parse_infos raise IOError(("MoviePy error: failed to read the duration of file %s.\n" OSError: MoviePy error: failed to read the duration of file C:\Users\Greg\AppData\Local\Temp\tmpkwxwbc8s.mp4. Here are the file infos returned by ffmpeg:

ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 9.2.1 (GCC) 20200122 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 C:\Users\Greg\AppData\Local\Temp\tmpkwxwbc8s.mp4: Permission denied

radrad avatar Oct 24 '24 08:10 radrad

Re: images, I made it a little trickier to set the model in code with the newest change that support multiple options.

If you pull the latest and have both Anthropic and OpenAI keys set, it will use the latest Claude 1022. See https://github.com/abi/screenshot-to-code/blob/8ee26ff566e5f4502f142e94fe992832d52ea0db/backend/routes/generate_code.py#L312

We currently use GPT_4O_2024_05_13 which is one update behind https://github.com/abi/screenshot-to-code/blob/8ee26ff566e5f4502f142e94fe992832d52ea0db/backend/routes/generate_code.py#L299 You can update that to the latest if you want.

You can't use o1-preview-2024-09-12 because that doesn't support image input as far as I know.

I'll make it easier to choose models in the future.

Re: video, this is a known issue. If you can convert the format of the video using video convertor, it should work. Some browser don't set duration correctly when capturing video and so, it doesn't work. You could also try a different browser.

abi avatar Oct 24 '24 16:10 abi

What about having both Option 1 (which I don't have. I have both Anthropic and OpenAI keys in .env) and Option 2 (which I do have)?

I find that model handling and hard coding them in multiple places is very hard to maintain for future updates. I understand there are differences in code bases on how some models from different providers are treated (to enable or disable some features) This is what I changed and feel free to apply this patch if my changes are appropriate to bring the latest models from both Anthropic and OpenAI. my_changes.patch

Re: "If you can convert the format of the video using video convertor, it should work." What should I convert in video?: image

The video (which you can check (https://github.com/user-attachments/assets/22713a47-4d23-44e9-b83f-dcb774ebbcc8) is captured by Snagit tool and you can see from properties of this video file that it does have Lenght and other video data (that can be seen in the picture)

Where exactly in the code I am getting this error?

Next. It would be good when selecting fine grained changes that you accumulate more than one selection. I find that after selecting some html element and providing a desired update, that there is an immediate code re-generation. I would prefere there are multiple selection and update prompts possible before re-generating

radrad avatar Oct 24 '24 18:10 radrad

Yeah, I will look to support the newest GPT4o model. Need to do some testing to ensure quality before switching to it.

I think a good thing to convert is to just re-encode it as MP4. If that's confusing, you could do MP4 to WebM. Or MP4 -> WebM -> MP4. It's an encoding issue with the video as far as I know when the duration error shows up.

Good suggestion re: more than 1 selection. Also, exploring newer models like Llama on Groq so the change is instant.

abi avatar Oct 25 '24 20:10 abi

Snagit is a professional screen video capture tool. I cannot see any problem with what it creates as .mp4. I did what you suggested MP4 -> WebM -> MP4 and there is the same file with this conversion steps. https://github.com/user-attachments/assets/fe15a9de-ab1a-4423-ac6b-9fa7da46e239

Can you try with my original video and this one.

Can you provide me with a couple of videos that do work to see if there is some other problem.

What about producing both Option 1 and Option 2 as now I only get Option 2. When whould Option 1 generate code?

radrad avatar Oct 25 '24 22:10 radrad

Screenshot 2024-10-28 at 6 30 59 PM

The video you provided works for me. What error do you get?

If Option 1 isn't working, please share the backend logs. But even without that, my guess would be that their Anthropic key isn't right.

abi avatar Oct 28 '24 22:10 abi