The general problem of completing images and videos depending on a mask is called inpainting, and numerous methods exists to tackle this problem in ingenuous ways.
However most approaches are relatively costly to run, especially without a graphic card, so I wanted to see what result we could get with simple and fast methods.
To start, we need to find the mask, which correspond to the location of pixels to remove.
Using ffmpeg we can extract the timestamps of the key frames in the video. Getting only the timing is fast, and we can later cap to a maximum the number of frames to actually extract. We could also take random frames, but the key frames are more likely to be diverse (can’t be too close in time), and faster to extract as other frames need to be reconstructed from the closest key.
With ffmpeg, we just need to run:
ffprobe -select_streams v -skip_frame nokey -show_frames -show_entries frame=pkt_pts_time video.mp4
Then for each
TIMESTAMPobtained, we can extract the frame:
ffmpeg -ss TIMESTAMP -i video.mp4 -vframes 1 frame.png
And lastly we can aggregate the results of a simple image filter over all frames, to create a mask:
# Compute the gradients per imagedx = np.gradient(images, axis=1).mean(axis=3) dy = np.gradient(images, axis=2).mean(axis=3) # Average globallymean_dx = np.abs(np.mean(dx, axis=0)) mean_dy = np.abs(np.mean(dy, axis=0)) # Filter on both axis at a hand picked thresholdthreshold = 10 salient = ((mean_dx > threshold) | (mean_dy > threshold)).astype(float) salient = normalize(gaussian_filter(salient, sigma=3)) mask = ((salient > 0.2) * 255).astype(np.uint8)
Bonus, if one wants to do it without python, it should be doable using ImageMagick’s
convertwith the existing Sobel filter and
-evaluate-sequence mean *.png mean.png.
We now have a global mask to inpaint, so we can simply use ffmpeg’s
removelogofilter to obtain our cleaned-up video:
ffmpeg -i video.mp4 -vf "removelogo=mask.png" cleaned.mp4