Upload videos or set of images. Download Youtube urls automatically. Browse & annotate uploaded videos. Ability to import pre-indexed datasets.
Perform scene detection, frame extraction on videos. Annotate frames, detections with bounding boxes, labels and metadata.
Extracted objects, along with entire frames and crops, are indexed using deep features. Feature vectors are used for visual search retrieval.
Deploy on variety of machines with/without GPUs, local & cloud. Docker compose enables automated setup of Postgres & RabbitMQ.
Visual Search as a primary interface
Upload videos, multiple images.
Provide Youtube url to be automatically downloaded.
Pre-trained recognition/detection, face recognition models.
Metadata stored in Postgres, all operations performed asynchronously.
Celery allows video & query flows to be easily modified.
Videos, frames, indexes, etc. stored in media directory, served through nginx.
Manually run code & tasks without UI using a Jupyter notebook.
Indexing using Google inception V3 trained on Imagenet
Alexnet using Pytorch
Labeled Faces in the Wild
Deep Video analytics is implemented using Docker and works on Mac, Windows and Linux. Make sure you have latest version of Docker installed.
git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics cd DeepVideoAnalytics/docker && docker-compose up
You need to have latest version of Docker and nvidia-docker installed. The GPU Dockerfile is slightly different from the CPU version dockerfile.
pip install --upgrade nvidia-docker-compose git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics cd DeepVideoAnalytics/docker && ./rebuild_gpu.sh nvidia-docker-compose -f docker-compose-gpu.yml up
We provide an AMI with all dependencies such as docker & nvidia drivers pre-installed. To use it start a P2.xlarge instance with AMI ID: ami-848f3d92 (N. Virginia) and ports 8000, 6006, 8888 open (preferably to only your IP). Run following commands after logging into the machine via SSH. After approximately 5 ~ 1 minutes the user interface will appear on port 8000 of the instance ip. AMI creation is documented here.
cd deepvideoanalytics/docker && git pull ./rebuild_gpu.sh nvidia-docker-compose -f docker-compose-gpu.yml up
Security warning: The current GPU container uses nginx <-> uwsgi <-> django setup to ensure smooth playback of videos. However it runs nginix as root (within the container). Since you can modify AWS Security rules on-the-fly, allow inbound traffic only from your own IP address.
Following options can be specified in docker-compose.yml, or your environment to selectively enable/disable algorithms.
ALEX_ENABLE=1 (to use Alexnet with PyTorch. disabled by default)
YOLO_ENABLE=1 (to use YOLO 9000. disabled by default)
SCENEDETECT_DISABLE=1 (to disable scene detection. enabled by default)
RESCALE_DISABLE=1 (to disable rescaling of frame extracted from videos. enabled by default)
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Zhang, Kaipeng, et al. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." arXiv preprint arXiv:1702.08734 (2017).