Creating Media Without an App (Chrome Dev Summit 2017)

[MUSIC PLAYING] MAT SCALES: Hey, everyone. So as you heard, my
name is Mat Scales. I’m with the Web Developer
Relations team at Google. Today, I’m going to talk
to you about creating media on the web. So you heard earlier
from [? Tal ?] about how there are billions
of users coming online, and how the mobile web is
a great way to reach them. One of the things those
users are going to want to do is to create and share media. Now, the web has always been
a great platform for sharing– sharing what we’re doing,
things that we’ve found, things that we’ve made. And you just post a
link to whatever it is, and anyone can just click
it and go straight to it. But until very
recently, creating photos, recording
videos, editing and filtering these
results was pretty tough, if not impossible. Now, thanks to some
pretty good progress over the last
couple of years, you can do all of this on the
mobile web right on the device. New APIs have landed that let
you create rich media content. So I’m going to talk today a
little bit about the challenges that are still being worked
on, and I’ll look ahead to some even more
exciting things that are coming in the future. But first, I’d like to introduce
Jenny and Peter from Instagram, who are going to
tell us how Instagram are using these features
on the mobile web today. [APPLAUSE] JENNIFER LIN: Thanks, Mat. Instagram’s mission is to
strengthen relationships through shared experiences. As we continue to connect
more of the world, our biggest opportunities are
emerging markets, countries where more and more
people are starting to use the internet
through mobile devices. To this end, Instagram is
investing in mobile web support, to make it
easier for people to use Instagram where
devices have limited storage or connections are unreliable. As a media-heavy
application, how do we deliver the
Instagram experience within the limitations of web? Today, we are going to share
some of the best practices. Most of you probably
recognize Instagram as a native application. What most of you
probably didn’t realize until this conference is
that starting this year, we started building out
our mobile web experience. This is Thanks to the
existence of new APIs, Chrome will prompt you
when certain criteria are met to install a progressive
web application like ours to the homescreen,
in the background. Now, I’m going to show you a
demo of our progressive web application. Here’s the native app
that you’re familiar with. Like the native app, you can
see that the progressive web app gets its own icon. When you click it, it
loads the same experience as, but without
looking like it’s in a browser. This allows users, particularly
in emerging markets, whose phones or
connections may limit them from downloading, using, or
wanting to use the native app, to get a true app-like
experience using our web product. You have stories, and you
have the content you follow, with the people in the feed. And you can take photos as well. So I’d like to take a photo
of all you lovely people. Can we get the house
lights on, please? All right. And then you can see
that we have the filters. So let’s post this. I’m not the best at Swype. And then share it out. And there you go. So– [APPLAUSE] Back to slides, please. Oh, thanks. Now we’re going to talk about
the technical details about how we implemented several
of these features. Specifically, we’re
going to deep-dive into how to add
the progressive web app to the homescreen,
video playback using adaptive streaming, image
capture and filters on web, as well as offline support. One thing to note
is that the majority of the features we’re talking
about today is Android only. However, because our target
audience are emerging markets, and emerging markets are almost
exclusively Android markets, this didn’t really
limit our reach. So let’s talk about how
to get the phone to prompt us to add to homescreen. There are a few requirements
to make this work. First, you need a web app
manifest, like this one. The required fields
are Name, which is used in the prompt,
Short Name, which is used on the homescreen,
Start URL to Load, and the icon that’s
used on the homescreen. In addition, you need to have
service workers registered on your site. For this, we use Workbox, a set
of libraries and instructions for service workers that
Jeff talked about yesterday, and [? Eva ?] mentioned as well. And this is
router.registerRoute. As a prerequisite for
using service worker, your site must be
loaded in HTTPS, something we are already doing. And finally, the person
coming to the site needs to trigger an
engagement metric. Right now, Google
has it set to 20, 30 seconds on the site
for the prompt to trigger. For testing and
development, though, you don’t want to be waiting 20
or 30 seconds every time, so you can get the prompt to
trigger without the engagement metric by going to Chrome Flags
and then turning on Bypass Engagement Check Mode. This is what the
prompt looks like. Unfortunately, it doesn’t
allow for much customization, and doesn’t display information
about what you are adding. Now, Owen did mention
yesterday that there is a new add-to-home
modal flow coming, but since that
isn’t available yet, and despite requiring
an additional click, we implemented our own modal
to give more information to the users before
having Chrome prompt them. In order to accomplish this,
we added an event listener before registering
the service worker. This event listener listens for
a Before Install Prompt event and prevents the
event from triggering, and then, instead, saving it
off to be triggered later. So we deliver a modal, and
when the user clicks Add, we then show the Chrome prompt
which was previously deferred. And that’s how you can give your
mobile web users the feeling of being in the native app. Now, Peter will
talk about how we made this experience more
engaging with optimized video performance. [APPLAUSE] PETER SHIN: Thanks, Jenny. In areas where network
resilience and reliability issues are commonplace,
video playback is generally a poor experience. You have a choice between
a low-quality video or a higher-quality video
that buffers and stalls. One solution for this experience
is adaptive bitrate streaming. With ABR, the video is
broken up into a sequence of segments, where each segment
is encoded in multiple bit rates. Here we have high,
medium, and low. A manifest is then
created which contains detailed information of each
segment and how to fetch it. Now, using the
segment, the clients are then able to
decide either to switch to a higher or lower bit rate
depending on current network conditions. Now, our first
step of integration was to choose a
client-side player. We were looking for something
that was open source, supported open standards,
and was extensible. We found Shaka Player was
a great out-of-box solution for our initial experiments. So in this video, it’s a
comparison between our existing video player on the left, and
Shaka Player on the right. They’re actually playing
at the same speed, but you can see the video on
the left buffering and stalling as we transition to
2G network conditions. The adaptive video,
on the other hand, is able to continue
with smooth playback. Now let’s look at
how we integrated. So this is a
standard integration. You’d create a new
Shaka Player instance and pass it a video element. You’d then load the manifest
file that I described earlier. So with this approach, there’s
actually another round trip just to fetch the manifest file. But we wanted to avoid this. In our case, before even
creating the Shaka Player, we already had the
manifest content. So we had to approach
this a little differently. Fortunately, Shaka has a
plugin system available. We create a custom networking
plugin, in this case. So we came up with a scheme– here it’s IGW– where the
video ID is in the path. We then get the manifest
content from an existing store using the video ID. We then create a response
with the manifest content and return it through a promise. Finally, we register
our custom scheme through the networking engine. Going back to the
integration, we simply load the player with the
URI with the custom scheme that we created
earlier, and we’re done. So we obviously wanted to
measure the impact of ABR, so we tested with three
different variants. First, there is
default Shaka settings. This is just vanilla Shaka. And then we added
our custom settings. So we added some overrides. And finally, we added
a custom ABR manager. So in the second variant, one
of the properties we changed was the switch interval. The switch interval is
the minimum amount of time before switching bit rates. We wanted to test how a more
aggressive switching strategy would impact user experience. In the third variant, we
added our custom ABR manager. So this gave us a
lot more flexibility. With the custom manager,
we had more control over how we measured bandwidth
and when we switched bit rates. In the feed page, we actually
have multiple instances of Shaka Player
at any given time. Now that we had control over
how we measured bandwidth, we also could keep track
of the latest measurement. Using a feedback
loop, we can then pass that back in to
newly-created Shaka instances, ensuring we have an
accurate default bandwidth. We try to follow
some best practices during these experiments. In our feed, we made sure to
only instantiate players when needed, and to actively use
the Destroy API when we didn’t. We understood that it
would take some time to find that sweet spot for
how frequently we’d switch or how much we’d buffer. Either being too
aggressive or too passive would change results. And finally, we were mindful
of the types of videos that should be supported. Instagram has videos as
short as three seconds. So depending on the
video, it might not make sense to support ABR. While we expect to continue
to iterate on our experiments, in general we have
high hopes for ABR and its positive impact
on user experience. Now Jenny is going to talk
about our experience adding image capture and filters. [APPLAUSE] JENNIFER LIN: When we started
working on the Instagram mobile web initiative at
the beginning of the year, it was important to us
to bring the mobile web users into the
ecosystem of Instagram. As you heard in our
mission statement earlier, we’re looking to
strengthen relationships through shared experiences. It is difficult to
share experiences simply by watching other
people’s experiences. You need to be able to
create media that captures your own experiences as well. Since this was one of the
first features we added, we’re actually not
using the newest APIs, which Mat will be talking
about in a little bit. We’re simply using the
image capture tag– ah, I’m sorry, the input tag. This is what it looks like. Now, it’s also possible
to add a capture field, but we purposely
left this out so that the user can either upload
the photo or take their own. We originally launched the
creation flow without filters so that our mobile users could
start sharing their experiences as soon as possible. But since filters is an
important part to the Instagram brand, it was important for
us to implement it as well. We used WebGL to
implement our filters. Since our native
app uses OpenGL, we were actually able to
reuse the same shaders, which let us bring over the same
exact filters as the app. As you can see, though,
the filter previews are done differently. This is because it is too
slow to calculate and load all the filters. So we use a standard balloon
image, no matter the photo. At Instagram, like with
Peter’s video experiment, we A/B test everything
before launching. So we put out filters to
some percentage of users, and we actually found that
several of our key metrics dropped significantly. When investigating why, we
considered a few different UI flows, including seeing
if there was a way to test whether or
not the balloon images themselves were the problem. But then we took a
step back, and we decided to test an even
more basic assumption. We took our control,
which was the creation flow without filters at all,
and then we tested a variation where we did all the WebGL
processing in the background with no user-facing changes. This test taught us a lot,
because this variation took the same hit in the
metrics as the variation with the user-facing filters UI. The performance hit
of WebGL was what caused the metrics to drop. So our next step was
improving the performance. We started with logging
the timing of everything. Instagram always crops
photos into a square, and we learned that
creating the initial crop was a significant bottleneck. Specifically, we were doing
it with a computationally expensive data URL
and blob conversion. Since the WebGL
rendering context.text image 2D accepts an
HTML canvas element, we’re able to return
a canvas directly. And WebGL will read that
canvas as the source pixels of the texture, like so. When the user is done
selecting the image, then we can render
a canvas.toBlob, which we found to be two times
faster than canvas.toDataURL, and then generate the
data URL from the blob. This reduced the time to
first WebGL draw by 35% and reduced the
time to transition to the next step of
the flow by about 85%. And this was just one of our
performance improvements. Another performance improvement
we did was to lazy load all of our shaders,
which are filters, instead of compiling
them all up front. This was our original code. And as you can see, we would
create all the filters on init, and then return them when
we needed a filter program. We refactored the code to
create a helper function, and that would initialize
the filters that don’t exist already,
and only calling them when the filter was needed. This reduced load
time by about 68%. These performance
improvements really improved our filters
experience, bringing filters to our mobile web users. Next, we made sharing
work even while offline. Peter will talk about why
and how we made this happen. [APPLAUSE] PETER SHIN: So I know this
is the second-to-last talk, but for a moment,
let’s imagine you just arrived at Chrome Dev Summit. You’re super excited
to hear all the talks and see all the demos, and you
want to share this experience. So you take out your phone,
take a photo, maybe a selfie, and you hit Share. Unfortunately, the
request times out, because all the
access points near you are completely overloaded. But it’s OK. You can turn off Wi-Fi,
and then you use your 4G, and you’re good to go. But what if you
couldn’t simply do that? What if that wasn’t
even an option? What if every day
was like being stuck at a conference with
really bad Wi-Fi? When we think of
offline support, we don’t think of someone
getting on an airplane. We think of someone who deals
with poor network conditions as a part of daily life. In those moments,
when they really want to share their
experience, there is so much friction right now. So let’s look at a demo of how
we’re trying to solve this. So here we have the PWA. You saw it earlier. And let’s go offline
with the Airplane mode. So we get a toast
telling us we’re offline, but we can still post. So let’s post a photo. [LAUGHTER] You add some caption,
maybe “offline demo.” [LAUGHTER] And we get a toast telling us
that when we connect again, it will be posted. So let’s go back online again. Let’s go back to the slides
and see how we did this. So first, we wanted to notify
the user that they’re offline. So we listen to
the Offline event. The Offline event actually
is a little bit unreliable. For some phones,
Battery Save mode is actually considered
an offline state, so it’ll actually get triggered. To guard against this, we
added a lightweight get request to ensure that
we’re truly offline. And then on error,
we show the toast. With Workbox, as Jenny
mentioned earlier, we then register a
post request route. Service workers can’t
cache push requests, so you actually need
to store the request in a client-side store. Here, we use an xDB, where we
break down our request object and store it. We create an offline
post helper function that will reconstruct
the request that we just stored earlier and send it. We then use the
Background Sync API that was mentioned in earlier talks. Now, earlier in the
demo, if it worked, we were planning to send
the sync event manually through dev tools, which would
have then triggered the sync event. So in practice,
depends on the device to actually trigger
this event, and it’s based on a number of conditions. So in the callback,
we then check if there’s any
items in the queue, and then we call our
offline post helper, and [? show them ?]
the notification. So as you can see, recent
advances in the web have enabled us to create
a feature-rich experience for mobile web users, especially
those in emerging markets. We’re actively
testing the features that we cover today, and are
continuing to iterate and learn along the way. It’s been incredibly
challenging, exciting, and humbling to work on
features and handle cases that we might not think
about in our day-to-day. We mentioned shared experiences
several times during this talk, so it was really great to be
able to share our experience building out these features. I wanted to give a shout
out to our teammates who aren’t on stage with us. They’ve all done amazing
work to get us to this point, and we hope to continue the
momentum we’ve already built. And now back to Mat, who
will describe the latest APIs for media creation. [MUSIC PLAYING] [APPLAUSE] MAT SCALES: Good job. Cool. Thank you, Jenny and Peter. So we’ve seen some of
what the web can do, and it looks pretty great. But we can do more
with new APIs. So one of the things
that Jenny mentioned is that Instagram are delegating
image capture to the input element. You can actually do
this inside your app. Now, it used to be a
little bit limited. We’ve had access to the
camera for quite a long time through getUserMedia,
which is part of WebRTC. It allows you to get a stream
from a camera or the microphone or both, and it’s
pretty simple API. But you couldn’t really do
too much with it before. You could use it
for WebRTC, which is what it was designed for. But other than that,
you could present it back to the user
using a video tag, or you could grab
an image from it. And the way you’d do that is
that you’d take the video– you’d take the
stream, you’d put it into a video element
as the source, then you’d draw the video,
one frame of the video, into a canvas, and then
you’d take the canvas and turn it into a blob,
and then you’d get the blob and you’d turn it into an image,
which is pretty longwinded, and was also just limited by
the APIs, because getUserMedia, those streams are
limited to 1080p HD, regardless of what
your camera can do. But we now have a new API
called the Image Capture API, which makes this a little
bit easier and much better. So it takes the stream that
you get from getUserMedia and gives you a new object
back, an image capture object. And it gives you a
Take Photo method. And when you call this, it
tells the physical device, the camera, to take a
full-resolution photo and just give you
straight back a blob. So none of this drawing
canvas nonsense in the middle. You also get some extra options. So you can see here
that in this example, I’m passing through a
fill light mode setting. This is setting what
the flash should do, so here I’m saying that the
flash should be set to auto. You can also set the
automatic red eye reduction through this method. Similarly, for audio and video,
there’s the MediaRecorder API. Again, you take a getUserMedia
stream and pass it in, and you say what MIME
type of output you want. Then you get a [? picture ?]
available event every time that the recorder has buffered
up enough data to give to you. And at the end, you can
reassemble this all up into a blob, for either
the video or audio file that you’re creating. And as well as
using getUserMedia to get these
streams, you can also get them straight from a
canvas, or from Web Audio. So this is how you’d do
things like live filters. You could take a video, draw
each frame into a canvas, apply your filters, and then
use a stream from the canvas to create new video
which is the output. And if your canvas was applying,
like, Instagram’s filters, you could get a filtered
video out [? of the site. ?] Now, not everything
here is perfect. There are still things
that we need to work on. One of the issues that you’ll
have trying to do these things is simply device performance. I mean, as I said,
[? Tal ?] has been talking about how we have to– many of the users
that are coming online have devices that aren’t
the same as the devices that we use. Many of them are
not as powerful. And things like drawing
Instagram’s filters are extremely
computationally intensive. There are also limitations
in the APIs themselves. So as an example, let’s
talk about something that I tried to do. So I wanted to create
a boomerang effect. What I wanted to do was
take a recorded video with MediaRecorder,
straight from the camera, and I wanted to then create
an output video which played the video forwards and
then played it backwards again. And then it would loop. It’s called a boomerang effect. So I tried to do something
that was pretty simple. I’d play the video,
and on each frame, I would set where in the video
I wanted to be in that frame. And I would have it– as soon as it got to the end,
it would set the direction to minus one, and it
would come back again. And then the idea was to
record this with MediaRecorder. And it’s awful. It’s trash. This is a video that I took,
and this is the full quality that I got. It’s extremely jerky. It’s difficult to tell exactly
when it’s going forwards and when it’s going backwards. Why did this happen? There are a couple of reasons. One of them is that
at the moment, when you use MediaRecorder
to record a video, the output is in a WebM, which
is optimized for streaming. And this means that it doesn’t
put in the index of where in the file each frame appears. It just assumes
that you’re going to play it through right
from the beginning, and then it would just
iterate through the file. So if you want to
seek, then it has to go right back to the
beginning of the file and work its way through until
it finds the correct location. Now, there are
some optimizations. If you’re playing forwards,
then it can make a rough guess. Oh, you got this far,
and it was this time, so it’s somewhere after that. Going backwards, you have to
start again from the beginning. It also means that you can’t do
the even simpler trick of just saying set the play
rate to minus one, because that also doesn’t work. It would have the same issue. You can fix this
with a library, which will take the video
that you’ve created and actually manipulate the
bytes to put in that index information so that you
can then make it seekable. But it’s pretty low level,
so a pretty chunky library. It would be better if the web
platform did this for you. Another issue is that
recording with MediaRecorder is always real time. So I thought I could fix this
issue by taking the video, lining up exactly
where I wanted to be, and then saying to the
MediaRecorder API, hey, I want one frame. Just record one
frame, and then wait until I’m ready
for the next one, and then say, OK,
take another frame. That doesn’t work like that. It always records at
exactly real time. So if it’s janky when it
goes into your canvas, then it will be janky
when you record it. It also means that if I tried
to take a one-minute video and do a boomerang of it, it
would always take exactly two minutes to create. I can’t do it faster
than real time either. So it’s not a perfect
solution right now for some of the things
that you might want to do. But of course, we want to
see the web, the mobile web, as a great platform
for media creation. So we’re working hard to
address all these things. Another new thing that’s coming
that we’re looking forward to is WebAssembly. You heard about this
earlier from Alex. One of the things that
this allows you to do is take native libraries,
recompile them for the web, and then use them in your page. And people have already been
experimenting with native video libraries, like [? half ?]
[? of ?] [? MPEG, ?] to do media manipulation on the web. Video doesn’t– oh well. We’re also excited about
the Shape Detection API. This lets you detect things
like text, bar codes, and faces inside an image. This is currently behind a
flag, but Francois [INAUDIBLE] demonstrated this
at I/O earlier, and actually had a demo
out in the forum area, which you might have seen. [INAUDIBLE] taking pictures,
so just wait a minute. If you’d like to know
more about the things that we’ve been
talking about here, I’ve been creating
a sample application that I called Snapshot. The source code is
available on GitHub, and I’ve been documenting my
experiences in a video diary, which is available on YouTube. In summary, I think that
what companies like Instagram are doing with media on
the web is incredibly cool. I hope you’re all as excited
about the future for this as I am. And thank you very
much for coming. [MUSIC PLAYING]

Be the first to comment

Leave a Reply

Your email address will not be published.