婆罗门
精华
|
战斗力 鹅
|
回帖 0
注册时间 2013-7-24
|
这种东西让AI写一个就行了
附件里的网页打开,用kimi2.5的api,温度1,就可以直接翻译。不过要等会,而且最好开console确认请求发过去了
对话框和翻译是一一对应的,开debug mode的话鼠标悬停在翻译上可以看到提取的原文
如果你自己想写,这里是prompt,用英文是因为有的模型对中文需求表现不稳,你想改需求的话用中文应该也行。推荐模型gemini3.1,国产模型中做的最好的是glm5,但是还是有图片和翻译对不上的情况
I want to have ONE HTML FILE that serves as a local frontend for a LLM based image translator with the following features: Upon opening, it will display a "open folder/image" button and a "drag image here" region, user can also directly paste image from clipboard. When using "open folder" button, it will display the image in the folder one by one Otherwise, it will display the image, along with a button underneath the image that says "translate" when user press the "translate" button, it will send the current displaying image to provided LLM with VL ability. after getting the reply, render the translation within original text's box position(auto-resize the font based on the box size to make sure translation fit within the box, but no smaller than 8pt), use whatever layout the original text uses(horizontal or vertical) with white background for every box, when the content is onomatopoeia, use 20pt fixed size font that is centered within the box), the translation should fade to show the image content when mouseover User can "toggle debug", when debug mode is toggled, mouseover the specific translation will still cause the translation to fade, but there will be a "debug box" rendered in horizontal layout using pretty print for the translation's JSON that uses the mouse's position as its corner The image should default to "fit view" so user can see the full image without scrolling, but have a zoom bar and allow user to scroll/pan Make the UI sleek and minimalistic
User should be able to set the endpoint address, api key, model name, temperature, prompt directly
User should be able to see the status of current LLM call User should be able to see the LLM response(and modify it) User should be able to resend the page to LLM to generate another response, instead of having the new response overwriting the old one, store all responses and allow user to compare them using a left/right panel interface, then select an "active" one to be displayed on image User should be able to send multiple calls to the LLM and Make sure there is no CORS error
I will provide this script/plugin with the endpoint address and key to an LLM with VL ability. I will offer it the prompt below: ''' 你是一个专业的漫画OCR提取与翻译专家。请仔细观察这张漫画图片,提取出其中的所有文字,将其翻译为中文,并提供每个文字区域的具体位置坐标。 【提取要求】 提取范围全面:必须包含所有的对话框内文字(dialogue)、旁白框文字(narration),以及散落在背景或对话框外的象声词/特效字(onomatopoeia)。 坐标系统:使用[0, 1000] 范围的相对归一化坐标。图片的左上角为 [0, 0],右下角为 [1000, 1000]。 坐标格式:边界框(bounding box)的格式必须严格为长度为4的数组:[ymin, xmin, ymax, xmax]。 阅读顺序:请遵循标准漫画的阅读顺序(从右到左,从上到下)输出结果。 【输出格式要求】 请仅输出纯合法的 JSON 格式数据,不要包含任何多余的解释、问候语或Markdown标记(不要输出 ```json 的开头和结尾)。JSON 的数据结构必须严格如下所示: { "panels":[ { "type": "dialogue", "box": [ymin, xmin, ymax, xmax], "original_text": "原文", "translation": "中文翻译" }, { "type": "onomatopoeia", "box": [ymin, xmin, ymax, xmax], "original_text": "ドドド", "translation": "轰隆隆" } ] } 注意:"type" 字段只能是 "dialogue"(对话)、"narration"(旁白)或 "onomatopoeia"(象声词/特效字)三者之一。请确保 JSON 格式绝对正确以便于程序直接解析。 ''' |
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有账号?立即注册
×
|