Sora 團(tuán)隊(duì)專訪:怎么開(kāi)發(fā)的?生成要多久?啥時(shí)候能用?

0 評(píng)論 1351 瀏覽 2 收藏 36 分鐘
🔗 产品经理的不可取代的价值是能够准确发现和满足用户需求,把需求转化为产品,并协调资源推动产品落地,创造商业价值。

前段時(shí)間,Sora 的核心團(tuán)隊(duì)接受了一個(gè)采訪,透露了很多未說(shuō)的信息。我把采訪記錄回聽(tīng)了 4 遍,整理下了英文逐字稿,并翻譯成了中文。

主持人:能邀請(qǐng)各位百忙之中抽空來(lái)參加這次對(duì)話,真是十分榮幸~

在對(duì)話開(kāi)始之前,要不先做個(gè)自我介紹?比如怎么稱呼,負(fù)責(zé)哪些事情?

First of all thank you guys for joining me. I imagine you’re super busy, so this is much appreciated. If you don’t mind, could you go one more time and give me your names and then your roles at OpenAI.

Bill Peebles:

Bill Peedles,在 OpenAI 負(fù)責(zé) Sora 項(xiàng)目

My name is Bill Peebles. I’m a lead on Sora here at OpenAI.

Tim Brooks:

Tim Brooks,負(fù)責(zé) Sora 項(xiàng)目的研究

My name is Tim Brooks. I’m also a research lead on Sora.

Aditya Ramesh:

Aditya,一樣的,也是負(fù)責(zé)人

I’m a Aditya. I lead Sora Team

主持人:

我對(duì) Sora 了解一些,主要還是看了你們發(fā)布的那些宣傳資料、網(wǎng)站,還有一些演示視頻,真挺牛的。能簡(jiǎn)單說(shuō)說(shuō) Sora 究竟是咋實(shí)現(xiàn)的嗎?我們之前有討論過(guò) DALL-E 和 Diffusion,但說(shuō)實(shí)話,我對(duì) Sora 的原理確實(shí)摸不透。

Okay, so I’ve reacted to Sora. I saw the announcement and the website and all those prompts and example videos that it made that you guys gave, and it was super impressive. Can you give me a super concise breakdown of how exactly it works? Cause we’ve explained DALL-E before and diffusion before, but how does Sora make videos?

Bill Peebles:

簡(jiǎn)單來(lái)說(shuō),Sora 是個(gè)生成模型。最近幾年,出現(xiàn)了很多很酷的生成模型,從 GPT 系列的語(yǔ)言模型到 DALL-E 這樣的圖像生成模型。

Yeah, at a high level Sora is a generative model, so there have been a lot of very cool generative models over the past few years, ranging from language models like the GPT family to image generation models like DALL-E.

Bill Peebles:

Sora 是專門(mén)生成視頻的模型。它通過(guò)分析海量視頻數(shù)據(jù),掌握了生成各種現(xiàn)實(shí)和虛擬場(chǎng)景的視頻內(nèi)容的能力。

具體來(lái)說(shuō),它借鑒了 DALL-E 那樣基于擴(kuò)散模型的思路,同時(shí)也用到了 GPT 系列語(yǔ)言模型的架構(gòu)??梢哉f(shuō),Sora 在訓(xùn)練方式上和 DALL-E 比較相似,但架構(gòu)更接近 GPT 系列。

Sora is a video generation model, and what that means is it looks at a lot of video data and learns to generate photorealistic videos. The exact way it does that kind of draws techniques from both diffusion-based models like DALL-E as well as large language models like the GPT family. It’s kind of like somewhere in between; it’s trained like DALL-E, but architecturally it looks more like the GPT family. But at a high level, it’s just trained to generate videos of the real world and of digital worlds and of all kinds of content.

主持人:

聽(tīng)起來(lái),Sora 像其他大語(yǔ)言模型一樣,是基于訓(xùn)練數(shù)據(jù)來(lái)創(chuàng)造內(nèi)容等。那么,Sora 的訓(xùn)練數(shù)據(jù)是什么呢?

It creates a huge variety of stuff, kind of the same way the other models do, based on what it’s trained on. What is Sora trained on?

Tim Brooks:

這個(gè)不方便說(shuō)太細(xì)??

但大體上,包括公開(kāi)數(shù)據(jù)及 OpenAI 的被授權(quán)數(shù)據(jù)。

We can’t go into much detail on it, but it’s trained on a combination of data that’s publicly available as well as data that OpenAI has licensed.

Tim Brooks:

不過(guò)有個(gè)事兒值得分享:

以前,不論圖像還是視頻模型,大家通常只在一個(gè)固定尺寸上進(jìn)行訓(xùn)練。而我們使用了不同時(shí)長(zhǎng)、比例和清晰度的視頻,來(lái)訓(xùn)練 Sora。

One innovation that we had in creating Sora was enabling it to train on videos at different durations, as well as different aspect ratios and resolutions. And this is something that’s really new. So previously, when you trained an image or video generation model, people would typically train them at a very fixed size like only one resolution, for example.

Tim Brooks:

至于做法,我們把各種各樣的圖片和視頻,不管是寬屏的、長(zhǎng)條的、小片的、高清的還是低清的,我們都把它們分割成了一小塊一小塊的。

But what we do is we take images, as well as videos, of all wide aspect ratios, tall long videos, short videos, high resolution, low resolution, and we turn them all into these small pieces we call patches.

Tim Brooks:

接著,我們可以根據(jù)輸入視頻的大小,訓(xùn)練模型認(rèn)識(shí)不同數(shù)量的小塊。

通過(guò)這種方式,我們的模型就能夠更加靈活地學(xué)習(xí)各種數(shù)據(jù),同時(shí)也能生成不同分辨率和尺寸的內(nèi)容。

And then we’re able to train on videos with different numbers of patches, depending on the size of the input, and that allows our model to be really versatile to train on a wider variety of data, and also to be used to generate content at different resolutions and sizes.

主持人:

你們已經(jīng)開(kāi)始使用、構(gòu)建和發(fā)展它一段時(shí)間了,可否解答我一個(gè)疑惑?

我本身是做視頻的,能想到這里要處理的東西有很多,比如光線啊、反光啊,還有各種物理動(dòng)作和移動(dòng)的物體等等。

所以我就有個(gè)問(wèn)題:就目前而言,你覺(jué)得 Sora 擅長(zhǎng)做什么?哪些方面還有所欠缺?比如我看到有個(gè)視頻里一只手竟然長(zhǎng)了六個(gè)手指。

You’ve had access to using it, building it, developing it for some time now. And obviously, there’s a, maybe not obviously, but there’s a ton of variables with video. Like I make videos, I know there are lighting, reflections, you know, all kinds of physics and moving objects and things involved. What have you found that Sora in its current state is good at? And maybe there are things that are specifically weaknesses, like I’ll show the video that I asked for in a second, where there are six fingers on one hand. But what have you seen are our particular strengths and weaknesses of what it’s making?

Tim Brooks:

Sora 特別擅長(zhǎng)于寫(xiě)實(shí)類的視頻,并且可以很長(zhǎng),1分鐘那么長(zhǎng),遙遙領(lǐng)先。

但在一些方面它仍然存在不足。正如你所提到的,Sora 還不能很好的處理手部細(xì)節(jié),物理效果的呈現(xiàn)也有所欠缺。比如,在之前發(fā)布的一個(gè)3D打印機(jī)視頻中,其表現(xiàn)并不理想。特定場(chǎng)景下,比如隨時(shí)間變化的攝像機(jī)軌跡,它也可能處理不佳。因此,對(duì)于一些物理現(xiàn)象和隨時(shí)間發(fā)生的運(yùn)動(dòng)或軌跡,Sora 還有待改進(jìn)。

It definitely excels at photo realism, which is a big step forward. And the fact that the videos can be so long, up to a minute, is really a leap from what was previously possible. But some things it still struggles with. Hands in general are a pain point, as you mentioned, but also some aspects of physics. And like in one of the examples with the 3D printer, you can see it doesn’t quite get that right. And also, if you ask for a really specific example like camera trajectory over time, it has trouble with that. So some aspects of physics and of the motion or trajectories that happen over time, it struggles with.

主持人:

看到 Sora 在一些特定方面做得這么好,實(shí)在是挺有趣的。

像你提到的,有的視頻在光影、反射,乃至特寫(xiě)和紋理處理上都非常細(xì)膩。這讓我想到 DALL-E,因?yàn)槟阃瑯涌梢宰?Sora 模仿 35mm 膠片拍攝的風(fēng)格,或者是背景虛化的單反相機(jī)效果。

但是,目前這些視頻還缺少了聲音。我就在想,為 AI 生成的視頻加上 AI 生成的聲音,這個(gè)過(guò)程是不是特別有挑戰(zhàn)性?是不是比我原先想象的要復(fù)雜很多?你們認(rèn)為要實(shí)現(xiàn)這樣的功能,我們還需要多久呢?

It’s really interesting to see the stuff it does well, because like you said, there are those examples of really good photorealism with lighting and reflections and even close-ups and textures. And just like DALL-E, you can give it styles like shot in 35mm film or shot, you know, like from a DSLR with a blurry background. There are no sounds in these videos, though. I’m super curious if it would be a gigantic extra lift to add sound to these, or if it’s more complicated than I’m realizing. How far does it feel like you are from being able to also have AI-generated sound in an AI-generated video?

Bill Peebles:

這種事情很難具體說(shuō)需要多久,并非技術(shù)難度,而是優(yōu)先級(jí)排期。

我們現(xiàn)在的當(dāng)務(wù)之急是要先把視頻生成模型搞得更強(qiáng)一些。畢竟,以前那些AI生成的視頻,最長(zhǎng)也就四秒,而且畫(huà)質(zhì)和幀率都不太行。所以,我們目前的主要精力都在提升這塊。

當(dāng)然了,我們也覺(jué)得視頻如果能加上聲音,那效果肯定是更棒的。但現(xiàn)在,Sora 主要還是專注于視頻生成。

It’s hard to give exact timelines with these kinds of things. For first one, we were really focused on pushing the capabilities of video generation models forward, because before this, you know, a lot of AI-generated video was like 4 seconds of pretty low frame rate and the quality wasn’t great. So that’s where a lot of our effort so far has been. We definitely agree though that, you know, adding in these other kinds of content would make videos way more immersive. So it’s something that we’re definitely thinking about. But right now, Sora is mainly just a video generation model and we’ve been focused on pushing the capabilities in that domain, for sure.

主持人:

你們?cè)?Sora 身上做了大量工作,它的進(jìn)步有目共睹。我很好奇,你們是怎么判斷它已經(jīng)達(dá)到了可以向世界展示的水平的?

就像 DALL-E 一樣,它在發(fā)布之初就驚艷全場(chǎng),這一定是一個(gè)值得銘記的時(shí)刻。另外,在 Sora 已經(jīng)表現(xiàn)出色的方面,你們是如何決定下一步的改進(jìn)方向的呢?有什么標(biāo)準(zhǔn)或者參考嗎?

So okay, DALL-E has improved a lot over time. It’s gotten better, it’s improved in a lot of ways and you guys are constantly developing and working towards making Sora better. First of all, how did you get to the point where you’d gotten good enough with it that you knew it was ready to share with the world and we had this mic drop moment? And then how do you know how to keep moving forward and making things that it’s better at?

Tim Brooks:

你可能會(huì)注意到,我們目前并沒(méi)有正式的發(fā)布 Sora,而是通過(guò)比如博客、Twitter、Tiktok 等渠道發(fā)布一些視頻。這里的主要原因是,我們希望在真正準(zhǔn)備好之前,更多的獲得一些來(lái)自用戶的反饋,了解這項(xiàng)技術(shù)如何能為人們帶來(lái)價(jià)值,同時(shí)也需要了解在安全方面還有哪些工作要做,這將為我們未來(lái)的研究指明方向。

現(xiàn)在的 Sora 還不成熟,也還沒(méi)有整合到 ChatGPT 或其他任何平臺(tái)中。我們會(huì)基于收集到的意見(jiàn)進(jìn)行不斷改進(jìn),但具體內(nèi)容還有待探討。

我們希望通過(guò)公開(kāi)展示來(lái)獲取更多反饋,比如從安全專家那里聽(tīng)取安全意見(jiàn),從藝術(shù)家那里了解創(chuàng)作思路等等,這將是我們未來(lái)工作的重點(diǎn)。

A big motivation for us, really the motivation for why we wanted to get Sora out in this, like a blog post form, but it’s not yet ready, is to get feedback to understand how this could be useful to people, also what safety work needs to be done. And this will really set our research roadmap moving forward. So it’s not currently a product. It’s not available in ChatGPT or anything. And we don’t even have any current timelines for when we would turn this into a product. But really, right now we’re in the feedback-getting stage. So we want to, you know, we’ll definitely be improving it, but how we should improve it is kind of an open question. And we wanted to show the world this technology that’s on the horizon, and start hearing from people about how could this be useful to you, hear from safety experts how could we make this safe for the world, hear from some artists how could this be useful in your workflows, and that’s really going to set our agenda moving forward.

主持人:

有哪些反饋,分享一下?

What have you heard so far?

Tim Brooks:

有一個(gè):用戶希望對(duì)生成的視頻有更精細(xì)、直接的控制,并非只有簡(jiǎn)單的提示詞。

這個(gè)挺有趣的,也這無(wú)疑是我們未來(lái)要重點(diǎn)考慮的一個(gè)方向。

One piece of feedback we’ve definitely heard is that people are interested in having more detailed controls. So that will be an interesting direction moving forward, whereas right now it’s about, you know, you have this maybe kind of short prompt. But people are really interested in having more control over exactly the content that’s generated, so that’s definitely one thing we’ll be looking into.

主持人

確實(shí),有些用戶可能只是想確保視頻是寬屏或豎屏,或者光線充足之類的,而不想花太多精力去設(shè)計(jì)復(fù)雜的提示詞。這個(gè)想法很有意思。

下一個(gè)話題,未來(lái) Sora 是否有可能生成出與真實(shí)視頻毫無(wú)二致的作品呢?我猜是可以的。

就像 DALL-E 那樣,隨著時(shí)間發(fā)展,越來(lái)越強(qiáng)。

Interesting, I can imagine just wanting to make sure it’s widescreen or make sure it’s vertical or it’s well-lit or something like that, just to not have to worry about prompt engineering, I guess. Okay, so I guess as if you’ve been working on this stuff for a long time, is there a future where you can generate a video that is indistinguishable from a normal video? Because that’s how it feels like DALL-E has evolved over time where you can ask for a photorealistic picture and it can make that. Is that something you could imagine actually being possible? I guess probably yes, because we’ve seen it do so much already.

Aditya Ramesh:

我也相信,因此我們會(huì)變得變得更為謹(jǐn)慎。

人們應(yīng)該知道他所看到的視頻,是真實(shí)的,還是 AI 生成的。我們希望 AI 的能力不會(huì)被用到造謠上。

Eventually I think it’s going to be possible, but of course as we approach that point we want to be careful about releasing these capabilities so that, you know, people on social media are aware of when a video they see could be real or fake. You know, when a video that they see comes from a trusted source, we want to make sure that these capabilities aren’t used in a way that could perpetuate misinformation or something.

主持人:

在 Sora 生成的視頻中,在右下角都有水印,這確實(shí)很明顯。但是,像這樣的水印可以被裁剪掉。

我很好奇,有沒(méi)有其他方法可以識(shí)別 AI 生成的視頻?

I saw there’s a watermark in the bottom corner of Sora-generated videos, which obviously is pretty important, but a watermark like that can be cropped. I’m curious if there are other ways that you guys think about being able to easily identify AI-generated videos, especially with a tool like Sora.

Aditya Ramesh:

對(duì)于 DALL·E 3,我們訓(xùn)練了一種溯源分類器,可以識(shí)別圖像是否由模型生成。

我們也在嘗試將此應(yīng)用于視頻,雖然不完美,但這是第一步。

For DALL·E 3, we trained provenance classifiers that can tell if an image was generated by the model or not. We’re working on adapting that technology to work for stored videos as well. That won’t be a complete solution in and of itself, but it’s kind of like a first step.

主持人:

懂了。就像是加上一些元數(shù)據(jù)或者某種嵌入的標(biāo)志,這樣如果你操作那個(gè)文件,你就知道它是 AI 生成的。

Got it. Kind of like metadata or like a sort of embedded flag, so that if you play with that file, you know it’s AI generated.

Aditya Ramesh:

C2PA 就是這樣做的,但我們訓(xùn)練的分類器可以直接應(yīng)用于任何圖像或視頻,它會(huì)告訴你這個(gè)媒體是否是由我們的某個(gè)模型生成的。

C2PA does that but the classifier that we trained can just be run on any image or video and it tells you if it thinks that the media was generated by one of our models or not.

主持人:

明白了。我還想知道你的個(gè)人感受。

顯然,你們必須等到覺(jué)得 Sora 準(zhǔn)備好了,可以向世界展示它的能力??吹狡渌藢?duì) Sora 的反應(yīng),你有什么感覺(jué)呢?

有很多人說(shuō)“太酷了,太神奇了”,但也有人擔(dān)心“哦不,我的工作岌岌可?!薄D闶窃趺纯创藗兏鞣N各樣的反應(yīng)的?

Got it. What I’m also curious about is your reaction. You obviously had to get to the point where Sora comes out and you think it’s ready for the world to see what it’s capable of. What’s been your reaction to other people’s reactions to Sora? There’s a lot of “this is super cool, this is amazing” but there’s also a lot of “oh my God, my job is in danger.” How do you digest all of the different ways people react to this thing?

Aditya Ramesh:

我能感受到人們對(duì)未來(lái)的焦慮。作為使命,我們會(huì)以安全負(fù)責(zé)的方式推出這項(xiàng)技術(shù),全面考慮可能帶來(lái)的各種影響。

但與此同時(shí),我也看到了許多機(jī)遇:現(xiàn)在如果有人想拍一部電影,由于預(yù)算高昂,要獲得資金支持可能非常困難-制片公司需要仔細(xì)權(quán)衡投資風(fēng)險(xiǎn)。而這里,AI 就可以大幅降低從創(chuàng)意到成片的成本,創(chuàng)造不同。

I felt like a lot of the reception was like, definitely, you know, some anxiety as to what’s going to happen next. And we definitely feel that in terms of, you know, our mission to make sure that this technology is deployed in a safe way, in a way that’s responsible to all of the things people are already doing involving video generation. But I also felt like a lot of opportunity, like right now, for example, if a person has an idea for a movie they want to produce, it can be really difficult to get funding to actually produce the movie because the budgets are so large. You know, production companies have to be aware of the risk associated with the investment that they make. One cool way that I think AI could help is if it drastically lowers the cost to go from idea to a finished video.

主持人:

Sora 和 DALL·E 確實(shí)有很多相似之處,尤其是在使用場(chǎng)景上。

我自己就經(jīng)常用 DALL·E 來(lái)設(shè)計(jì)各種概念圖,幫助很大。我相信對(duì)于 Sora 來(lái)說(shuō),類似的創(chuàng)意應(yīng)用場(chǎng)景也會(huì)有無(wú)限可能。

我知道,Sora 現(xiàn)在還沒(méi)具體的開(kāi)放時(shí)間,但你覺(jué)會(huì)很快嗎?

Yeah, there’s a lot of parallels with DALL·E just in the way I feel like people are going to use it. Because when DALL·E got really good, I started – I mean, I can use it as a brainstorming tool. I can use it to sort of visualize a thumbnail for a video, for example. I could see a lot of the same cool-like use cases being particularly awesome with Sora. I know you’re not giving timelines, but you’re in the testing phase now. Do you think it’s going to be available for public use anytime soon?

Aditya Ramesh:

我覺(jué)得不會(huì)那么快,我覺(jué)得??

Not any time soon, I think.

主持人:

最后一個(gè)問(wèn)題是:在將來(lái),當(dāng) Sora 能制作出帶聲音的、極度逼真的、5分鐘的 YouTube 視頻的時(shí)候,會(huì)出現(xiàn)哪些新的、要應(yīng)對(duì)的問(wèn)題?

更進(jìn)一步說(shuō),相較于圖片,視頻制作的復(fù)雜的要高得多。但視頻則涉及到時(shí)間、物理等多個(gè)維度,還有反射、聲音等諸多新的難題。

說(shuō)實(shí)話,你們進(jìn)入視頻生成領(lǐng)域的速度遠(yuǎn)超我的預(yù)期。那么在 AI 生成媒體這個(gè)大方向上,下一步會(huì)是什么呢?

I guess my last question is, way down the road, way down into the future, when Sora is making five-minute YouTube videos with sound and perfect photorealism. What medium makes sense to dive into next? I mean, photos is one thing, videos have this whole dimension with time and physics and all these new variables with reflections and sound. You guys are, you jumped into this faster than I thought. What is next on the horizon for AI-generated media in general?

Tim Brooks:

我期待看到人們用 AI 來(lái)創(chuàng)造全新的東西。大聰明:來(lái)看看離譜村吧

去復(fù)刻已有對(duì)東西,不算難事兒;但使用新工具,去創(chuàng)造未曾出現(xiàn)的東西,著實(shí)令人心動(dòng)!

對(duì)我來(lái)說(shuō),一直激勵(lì)我的,正是讓那些真正有創(chuàng)意的人,將一切不可能的事情變成可能,不斷推進(jìn)創(chuàng)造力的邊界,這太令人興奮了!

So something I’m really excited for is how the use of AI tools evolves into creating completely new content and I think a lot of it will be us learning from how people use these tools to do new things. But often it’s easy to think about how they could be used to create existing things. But I actually think they’ll enable completely new types of content. It’s hard to know what that is until it’s in the hands of the most creative people, but really creative people when they have new tools do amazing things. They make new things that were not previously possible. That’s really what motivates me a lot. Long term, it’s like how could this turn into completely new experiences in media that currently aren’t capable, that currently we’re not even thinking about. It’s hard to picture exactly what that is, but I think that will be really exciting to just be pushing the creative boundaries and allowing really creative people to push those boundaries by making completely new tools.

主持人:

確實(shí)有趣??!

我覺(jué)得,由于它們是基于已有內(nèi)容訓(xùn)練的,因此生成的東西也只能建立在現(xiàn)有內(nèi)容之上。要讓它們發(fā)揮創(chuàng)造力,唯一的辦法可能就是通過(guò)你給它的 prompt 了。

你需要在如何巧妙地提出要求上下功夫,琢磨該如何引導(dǎo)它。這么理解對(duì)嗎?

Yeah, it’s interesting. I feel like the way it works is that since it’s trained on existing content, it can only produce things based on what already exists. The only way to get it to be creative is with your prompt, I imagine. You have to get clever with the learning curves prompt engineering and figuring out what to say to it. Is that accurate?

Bill Peebles:

除了prompt,Sora 還可以通過(guò)其他方式引導(dǎo)視頻生成。

比如在我們之前發(fā)布的報(bào)告里,演示了如何將兩個(gè)的混合輸入:左邊視頻一開(kāi)始是無(wú)人機(jī)飛過(guò)斗獸場(chǎng),然后逐漸過(guò)渡到右邊 – 蝴蝶在水下游動(dòng)。中間有一個(gè)鏡頭,斗獸場(chǎng)漸漸毀壞,然后被看起來(lái)像被珊瑚覆蓋,沉入水中。

像這一類的視頻生成,無(wú)論是技術(shù)還是體驗(yàn),都是完全與以往不同的。

There are other kinds of cool capabilities that the model has, sort of beyond just like text-based prompting. So in our research post that we released with Sora, we had an example where we showed blending between two input videos, and there was one really cool example of that where the video on the left starts out as a drone flying through the Colosseum, and on the right it gradually transitions into like a butterfly swimming underwater. There’s a point in there where the Colosseum gradually begins decaying and looking as if it’s covered in coral reefs and is partially underwater. These kinds of, you know, generated videos really do kind of start to feel a bit new relative to what’s been possible in the past with older forms of technology, and so we’re excited about these kinds of things, even beyond just prompting, as being new experiences that people can generate with technology like Sora.

Aditya Ramesh:

從某種意義上來(lái)說(shuō),我們做的事情,就是先模擬自然,再超越自然!

In some ways we really see modeling reality as the first step to being able to transcend it.

主持人:

哇,這確實(shí)挺酷的,很有意思??!

Sora能夠越精準(zhǔn)地模擬現(xiàn)實(shí),我們就能在它的基礎(chǔ)上越快地進(jìn)行創(chuàng)新和創(chuàng)作。理想情況下,它甚至能成為一種工具,開(kāi)辟新的創(chuàng)意可能性,激發(fā)更多的創(chuàng)造性思維。

真的超級(jí)贊!

如果有什么話想對(duì)大家說(shuō),現(xiàn)在正是個(gè)好時(shí)機(jī)。畢竟,你們是最早開(kāi)始這個(gè)項(xiàng)目的人,比任何人都更早地看到了它的潛力。關(guān)于Sora和OpenAI,還有什么是你們想讓大家知道的嗎?

Wow! I like that, it’s really interesting yeah. The better it is able to model reality, the faster you’re able to sort of build on top of it, and ideally that unlocks new creative possibilities as a tool and all kinds of other things. Super cool! Well, I’ll leave it open to if there’s anything else you want people to know. Obviously, you guys have been working on this longer than anyone else has gotten to see what it does or play with it. What else do you want the world to know about Sora and OpenAI?

Tim Brooks:

我們還特別興奮的一點(diǎn)是,AI通過(guò)從視頻數(shù)據(jù)中學(xué)習(xí),將不僅僅在視頻創(chuàng)作方面發(fā)揮作用。畢竟,我們生活在一個(gè)充滿視覺(jué)信息的世界,很多關(guān)于這個(gè)世界的信息是無(wú)法僅通過(guò)文本來(lái)傳達(dá)的。

雖然像GPT這樣的模型已經(jīng)非常聰明,對(duì)世界有著深刻的理解,但如果它們無(wú)法像我們一樣“看到”這個(gè)世界,那么它們就會(huì)缺失一些信息。

因此,我們對(duì)Sora及未來(lái)可能在Sora基礎(chǔ)上開(kāi)發(fā)的其他AI模型充滿期待。通過(guò)學(xué)習(xí)世界的視覺(jué)信息,它們將能更好地理解我們所生活的世界,因?yàn)橛辛烁羁痰睦斫?,未?lái)它們能夠更好地幫助我們。

I think another thing we’re excited about is how learning from video data will make AI more useful a bit more broadly than just creating videos, because we live in a world where we see things kind of like a video that we’re watching, and there’s a lot of information about the world that’s not in text. While models like GPT are really intelligent and understand a lot about the world, there is information that they’re missing when they don’t see the visual world in a way similar to how we do. So one thing we’re excited about for Sora and other AI models moving forward that build on top of Sora is that by learning from visual data about the world, they will hopefully just have a better understanding of the world we live in, and in the future be able to help us better just because they understand things better.

主持人:

確實(shí)非???!我猜背后肯定有大量的計(jì)算工作和一群技術(shù)大神!

說(shuō)實(shí)話,我一直盼著某天能用上 Sora,有進(jìn)度來(lái)請(qǐng)立即敲我~

That is super cool. I imagine there’s a lot of computing and a lot of talented engineering that goes into that. So I wish you guys the best of luck. I mean eventually when I’m able to plug in more stuff into Sora, I’m very excited for that moment too. So keep me posted.

Bill Peebles:

沒(méi)問(wèn)題

We’ll do

主持人:

謝啦

Thank you.

OpenAI Team:

感謝

Thanks.

1000 thousand years later…

主持人:

對(duì)了,我還忘了問(wèn)他們一個(gè)挺有意思的問(wèn)題。雖然錄的時(shí)候沒(méi)問(wèn)到,但大家都想知道,用一個(gè)提示讓 Sora 生成一個(gè)視頻需要多長(zhǎng)時(shí)間?

我私信問(wèn)了他們,答案是:得看具體情況,但你可以去買(mǎi)杯咖啡回來(lái),它可能還在忙著生成視頻。

所以,答案是「需要挺長(zhǎng)一段時(shí)間」

One more fun fact I forgot to ask them. During the recording, but everyone wanted to know how long does it take to generate a video with Sora with a single prompt? I did ask them off camera, and the answer was: it depends, but you could go get coffee and come back, and it would still be working on the video. So a while seems to be the answer.

作者:賽博禪心

微信公眾號(hào):賽博禪心

本文由 @賽博禪心 翻譯發(fā)布于人人都是產(chǎn)品經(jīng)理。未經(jīng)作者許可,禁止轉(zhuǎn)載。

題圖來(lái)自 Unsplash,基于 CC0 協(xié)議

該文觀點(diǎn)僅代表作者本人,人人都是產(chǎn)品經(jīng)理平臺(tái)僅提供信息存儲(chǔ)空間服務(wù)。

更多精彩內(nèi)容,請(qǐng)關(guān)注人人都是產(chǎn)品經(jīng)理微信公眾號(hào)或下載App
評(píng)論
評(píng)論請(qǐng)登錄
  1. 目前還沒(méi)評(píng)論,等你發(fā)揮!
专题
15315人已学习12篇文章
运费是电商的基础功能模块之一,承担着商品运费计算的作用。本专题的文章分享了如何设计运费规则。
专题
12243人已学习12篇文章
本专题的文章分享了营销案例解析。
专题
12324人已学习14篇文章
数字营销有着精准度高、成本较低、效果可量化等优点,很多企业都尝试了数字营销。本专题的文章分享了数字营销的相关内容。
专题
15302人已学习12篇文章
本专题的文章分享了交互设计文档的撰写指南。
专题
37217人已学习20篇文章
“搜索功能”拆解:小功能,大细节。
专题
13622人已学习11篇文章
抽奖作为一种活跃用户的运营手段之一,在产品运营的工作里是一项大家必须掌握的技能。本专题的文章分享了抽奖类活动的设计指南。