python /home/admin/mtr/script_for_cron.py -j datou_current3 -m 20 -a ' -a 3318 ' -s datou_3318 -M 0 -S 0 -U 95,95,120 import MySQLdb succeeded Import error (python version) ['/Users/moilerat/Documents/Fotonower/install/caffe/distribute/python', '/home/admin/workarea/git/Velours/python/prod', '/home/admin/workarea/install/caffe_cuda8_python3/python', '/home/admin/workarea/install/darknet', '/home/admin/workarea/git/Velours/python', '/home/admin/workarea/install/caffe_frcnn_python3/py-faster-rcnn/caffe-fast-rcnn/python', '/home/admin/mtr/.credentials', '/home/admin/workarea/install/caffe/python', '/home/admin/workarea/install/caffe_frcnn/py-faster-rcnn/tools', '/home/admin/workarea/git/fotonowerpip', '/home/admin/workarea/install/segment-anything', '/home/admin/workarea/git/pyfvs', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/home/admin/.local/lib/python3.8/site-packages', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages'] process id : 1329704 load datou : 3318 # VR 17-11-17 : to create in DB ! Here we check the datou graph and we reorder steps ! Tree builded and cycle checked, now we need to re-order the steps ! We have currenlty an error because there is no dependence between the last step for the case tile - detect - glue We can either keep the depence of, it is better to keep an order compatible with the id of steps if we do not have sons, so a lexical order : (number_son, step_id) All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! DONE and to test : checkNoCycle ! Here we check the consistency of inputs/outputs number between the given ones and the db ! eke 1-6-18 : checkConsistencyNbInputNbOutput should be processed after step reordering ! WARNING : number of outputs for step 7928 mask_detect is not consistent : 3 used against 2 in the step definition ! Step 8092 crop_condition have less inputs used (1) than in the step definition (2) : maybe we manage optionnal inputs ! WARNING : number of outputs for step 8092 crop_condition is not consistent : 4 used against 3 in the step definition ! WARNING : number of inputs for step 7933 rle_unique_nms_with_priority is not consistent : 2 used against 1 in the step definition ! WARNING : number of outputs for step 7933 rle_unique_nms_with_priority is not consistent : 2 used against 1 in the step definition ! WARNING : number of outputs for step 7935 ventilate_hashtags_in_portfolio is not consistent : 2 used against 1 in the step definition ! Step 7934 final have less inputs used (2) than in the step definition (3) : maybe we manage optionnal inputs ! Step 7934 final have less outputs used (1) than in the step definition (2) : some outputs may be not used ! WARNING : number of outputs for step 13649 velours_tree is not consistent : 2 used against 1 in the step definition ! Step 9283 split_time_score have less inputs used (1) than in the step definition (2) : maybe we manage optionnal inputs ! Number of inputs / outputs for each step checked ! Here we check the consistency of outputs/inputs types during steps connections eke 1-6-18 : checkConsistencyTypeOutputInput should be processed after checkConsistencyNbInputNbOutput ! We ignore checkConsistencyTypeOutputInput for datou_step final ! WARNING : type of output 1 of step 7935 doesn't seem to be define in the database( WARNING : type of input 3 of step 7934 doesn't seem to be define in the database( We ignore checkConsistencyTypeOutputInput for datou_step final ! WARNING : type of input 1 of step 7935 doesn't seem to be define in the database( WARNING : output 1 of step 7933 have datatype=7 whereas input 1 of step 7935 have datatype=None WARNING : type of output 2 of step 7928 doesn't seem to be define in the database( WARNING : type of input 2 of step 8092 doesn't seem to be define in the database( WARNING : type of output 3 of step 8092 doesn't seem to be define in the database( WARNING : type of input 1 of step 7933 doesn't seem to be define in the database( WARNING : type of output 2 of step 7928 doesn't seem to be define in the database( WARNING : type of input 1 of step 10917 doesn't seem to be define in the database( WARNING : type of output 2 of step 7928 doesn't seem to be define in the database( WARNING : type of input 1 of step 10918 doesn't seem to be define in the database( We ignore checkConsistencyTypeOutputInput for datou_step final ! WARNING : output 0 of step 7935 have datatype=10 whereas input 3 of step 10916 have datatype=6 WARNING : output 0 of step 7935 have datatype=10 whereas input 0 of step 13649 have datatype=18 WARNING : type of output 1 of step 13649 doesn't seem to be define in the database( WARNING : type of input 5 of step 10916 doesn't seem to be define in the database( DataTypes for each output/input checked ! Unexpected type for variable list_input_json ERROR or WARNING : can't parse json string Expecting value: line 1 column 1 (char 0) Tried to parse : chemin de la photo was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? [(photo_id, hashtag_id, hashtag_type, x0, x1, y0, y1, score, seg_temp, polygons), ...] was removed should we ? chemin de la photo was removed should we ? [ (photo_id_loc, hashtag_id, hashtag_type, x0, x1, y0, y1, score, None), ...] was removed should we ? chemin de la photo was removed should we ? id de la photo (peut être local ou global) was removed should we ? chemin de la photo was removed should we ? (x0, y0, x1, y1) was removed should we ? chemin de la photo was removed should we ? donnée sous forme de texte was removed should we ? [ (photo_id, photo_id_loc, hashtag_type, x0, x1, y0, y1, score), ...] was removed should we ? None was removed should we ? donnée sous forme de texte was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? id de la photo (peut être local ou global) was removed should we ? donnée sous forme de texte was removed should we ? donnée sous forme de texte was removed should we ? donnée sous forme de texte was removed should we ? chemin de la photo was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? chemin de la photo was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? None was removed should we ? donnée sous forme de nombre was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? (photo_id, hashtag_id, score_max) was removed should we ? donnée sous forme de texte was removed should we ? None was removed should we ? donnée sous forme de texte was removed should we ? [ptf_id0,ptf_id1...] was removed should we ? FOUND : 1 Here is data_from_sql_as_vec to set the ParamDescriptorType : (5275, 'learn_RUBBIA_REFUS_AMIENS_23', 16384, 25088, 'learn_RUBBIA_REFUS_AMIENS_23', 'pool5', 10.0, None, None, 256, None, 0, None, 8, None, None, -1000.0, 1, datetime.datetime(2021, 4, 23, 14, 19, 39), datetime.datetime(2021, 4, 23, 14, 19, 39)) load thcls load THCL from format json or kwargs add thcl : 2847 in CacheModelConfig load pdts add pdt : 5275 in CacheModelConfig Running datou job : batch_current TODO datou_current to load to do maybe to take outside batchDatouExec updating current state to 1 list_input_json: [] Current got : datou_id : 3318, datou_cur_ids : ['2732571'] with mtr_portfolio_ids : ['22142991'] and first list_photo_ids : [] new path : /proc/1329704/ Inside batchDatouExec : verbose : 0 # VR 17-11-17 : to create in DB ! Here we check the datou graph and we reorder steps ! Tree builded and cycle checked, now we need to re-order the steps ! We have currenlty an error because there is no dependence between the last step for the case tile - detect - glue We can either keep the depence of, it is better to keep an order compatible with the id of steps if we do not have sons, so a lexical order : (number_son, step_id) All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! All sons are already in current list ! DONE and to test : checkNoCycle ! Here we check the consistency of inputs/outputs number between the given ones and the db ! eke 1-6-18 : checkConsistencyNbInputNbOutput should be processed after step reordering ! WARNING : number of outputs for step 7928 mask_detect is not consistent : 3 used against 2 in the step definition ! Step 8092 crop_condition have less inputs used (1) than in the step definition (2) : maybe we manage optionnal inputs ! WARNING : number of outputs for step 8092 crop_condition is not consistent : 4 used against 3 in the step definition ! WARNING : number of inputs for step 7933 rle_unique_nms_with_priority is not consistent : 2 used against 1 in the step definition ! WARNING : number of outputs for step 7933 rle_unique_nms_with_priority is not consistent : 2 used against 1 in the step definition ! WARNING : number of outputs for step 7935 ventilate_hashtags_in_portfolio is not consistent : 2 used against 1 in the step definition ! Step 7934 final have less inputs used (2) than in the step definition (3) : maybe we manage optionnal inputs ! Step 7934 final have less outputs used (1) than in the step definition (2) : some outputs may be not used ! WARNING : number of outputs for step 13649 velours_tree is not consistent : 2 used against 1 in the step definition ! Step 9283 split_time_score have less inputs used (1) than in the step definition (2) : maybe we manage optionnal inputs ! Number of inputs / outputs for each step checked ! Here we check the consistency of outputs/inputs types during steps connections eke 1-6-18 : checkConsistencyTypeOutputInput should be processed after checkConsistencyNbInputNbOutput ! We ignore checkConsistencyTypeOutputInput for datou_step final ! WARNING : type of output 1 of step 7935 doesn't seem to be define in the database( WARNING : type of input 3 of step 7934 doesn't seem to be define in the database( We ignore checkConsistencyTypeOutputInput for datou_step final ! WARNING : type of input 1 of step 7935 doesn't seem to be define in the database( WARNING : output 1 of step 7933 have datatype=7 whereas input 1 of step 7935 have datatype=None WARNING : type of output 2 of step 7928 doesn't seem to be define in the database( WARNING : type of input 2 of step 8092 doesn't seem to be define in the database( WARNING : type of output 3 of step 8092 doesn't seem to be define in the database( WARNING : type of input 1 of step 7933 doesn't seem to be define in the database( WARNING : type of output 2 of step 7928 doesn't seem to be define in the database( WARNING : type of input 1 of step 10917 doesn't seem to be define in the database( WARNING : type of output 2 of step 7928 doesn't seem to be define in the database( WARNING : type of input 1 of step 10918 doesn't seem to be define in the database( We ignore checkConsistencyTypeOutputInput for datou_step final ! WARNING : output 0 of step 7935 have datatype=10 whereas input 3 of step 10916 have datatype=6 WARNING : output 0 of step 7935 have datatype=10 whereas input 0 of step 13649 have datatype=18 WARNING : type of output 1 of step 13649 doesn't seem to be define in the database( WARNING : type of input 5 of step 10916 doesn't seem to be define in the database( DataTypes for each output/input checked ! List Step Type Loaded in datou : mask_detect, crop_condition, rle_unique_nms_with_priority, ventilate_hashtags_in_portfolio, final, blur_detection, brightness, velours_tree, send_mail_cod, split_time_score over limit max, limiting to limit_max 40 list_input_json : [] origin We have 1 , BFBFBFBFBFBFBFBFBFBFBFBFBFBFBFwe have missing 0 photos in the step downloads : photo missing : [] try to delete the photos missing in DB length of list_filenames : 15 ; length of list_pids : 15 ; length of list_args : 15 time to download the photos : 2.4280552864074707 About to test input to load we should then remove the video here, and this would fix the bug of datou_current ! Calling datou_exec Inside datou_exec : verbose : 0 number of steps : 10 step1:mask_detect Tue Apr 8 13:00:31 2025 VR 17-11-17 : now, only for linear exec dependencies tree, some output goes to fill the input of the next VR 22-3-18 : now we test the dependencies tree, but keep two separate code for datou_prepare_output_input until the code is correctly tested, clean and works in both case VR 22-3-18 : but we use the first code for the first step id = -1, build in the code of datou_exec VR 22-3-18 : we should manage here the case when we are at the first step instead of building this step before datou_exec Beginning of datou step mask_detect ! save_polygon : True begin detect begin to check gpu status inside check gpu memory l 3637 free memory gpu now : 5399 max_wait_temp : 1 max_wait : 0 gpu_flag : 0 2025-04-08 13:00:34.119484: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2025-04-08 13:00:34.147170: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3493065000 Hz 2025-04-08 13:00:34.149367: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f39f0000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2025-04-08 13:00:34.149427: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2025-04-08 13:00:34.153388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2025-04-08 13:00:34.318921: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1e019220 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2025-04-08 13:00:34.318974: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5 2025-04-08 13:00:34.319863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2025-04-08 13:00:34.320207: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2025-04-08 13:00:34.322328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2025-04-08 13:00:34.324459: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2025-04-08 13:00:34.324786: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2025-04-08 13:00:34.327039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2025-04-08 13:00:34.328177: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2025-04-08 13:00:34.332878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2025-04-08 13:00:34.334080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2025-04-08 13:00:34.334172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2025-04-08 13:00:34.334806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2025-04-08 13:00:34.334822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2025-04-08 13:00:34.334831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2025-04-08 13:00:34.336931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4789 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5) WARNING:tensorflow:From /home/admin/workarea/git/Velours/python/mtr/mask_rcnn/mask_detection.py:69: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead. 2025-04-08 13:00:34.654400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2025-04-08 13:00:34.654522: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2025-04-08 13:00:34.654556: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2025-04-08 13:00:34.654587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2025-04-08 13:00:34.654615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2025-04-08 13:00:34.654644: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2025-04-08 13:00:34.654671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2025-04-08 13:00:34.654699: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2025-04-08 13:00:34.656217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2025-04-08 13:00:34.657460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2025-04-08 13:00:34.657531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2025-04-08 13:00:34.657561: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2025-04-08 13:00:34.657589: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2025-04-08 13:00:34.657614: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2025-04-08 13:00:34.657639: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2025-04-08 13:00:34.657664: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2025-04-08 13:00:34.657691: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2025-04-08 13:00:34.658992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2025-04-08 13:00:34.659031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2025-04-08 13:00:34.659044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2025-04-08 13:00:34.659055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2025-04-08 13:00:34.660363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4789 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5) Using TensorFlow backend. WARNING:tensorflow:From /home/admin/workarea/install/Mask_RCNN/model.py:396: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version. Instructions for updating: box_ind is deprecated, use box_indices instead WARNING:tensorflow:From /home/admin/workarea/install/Mask_RCNN/model.py:703: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.cast` instead. WARNING:tensorflow:From /home/admin/workarea/install/Mask_RCNN/model.py:729: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.cast` instead. Inside mask_sub_process Inside mask_detect About to load cache.load_thcl_param To do loadFromThcl(), then load ParamDescType : thcl2847 thcls : [{'id': 2847, 'mtr_user_id': 31, 'name': 'learn_RUBBIA_REFUS_AMIENS_23', 'pb_hashtag_id': 0, 'live': b'\x00', 'list_hashtags': 'background,papier,carton,metal,pet_clair,autre,pehd,pet_fonce,environnement', 'svm_portfolios_learning': '0,0,0,0,0,0,0,0,0', 'photo_hashtag_type': 3594, 'photo_desc_type': 5275, 'type_classification': 'mask_rcnn', 'hashtag_id_list': '0,0,0,0,0,0,0,0,0'}] thcl {'id': 2847, 'mtr_user_id': 31, 'name': 'learn_RUBBIA_REFUS_AMIENS_23', 'pb_hashtag_id': 0, 'live': b'\x00', 'list_hashtags': 'background,papier,carton,metal,pet_clair,autre,pehd,pet_fonce,environnement', 'svm_portfolios_learning': '0,0,0,0,0,0,0,0,0', 'photo_hashtag_type': 3594, 'photo_desc_type': 5275, 'type_classification': 'mask_rcnn', 'hashtag_id_list': '0,0,0,0,0,0,0,0,0'} Update svm_hashtag_type_desc : 5275 FOUND : 1 Here is data_from_sql_as_vec to set the ParamDescriptorType : (5275, 'learn_RUBBIA_REFUS_AMIENS_23', 16384, 25088, 'learn_RUBBIA_REFUS_AMIENS_23', 'pool5', 10.0, None, None, 256, None, 0, None, 8, None, None, -1000.0, 1, datetime.datetime(2021, 4, 23, 14, 19, 39), datetime.datetime(2021, 4, 23, 14, 19, 39)) {'thcl': {'id': 2847, 'mtr_user_id': 31, 'name': 'learn_RUBBIA_REFUS_AMIENS_23', 'pb_hashtag_id': 0, 'live': b'\x00', 'list_hashtags': 'background,papier,carton,metal,pet_clair,autre,pehd,pet_fonce,environnement', 'svm_portfolios_learning': '0,0,0,0,0,0,0,0,0', 'photo_hashtag_type': 3594, 'photo_desc_type': 5275, 'type_classification': 'mask_rcnn', 'hashtag_id_list': '0,0,0,0,0,0,0,0,0'}, 'list_hashtags': ['background', 'papier', 'carton', 'metal', 'pet_clair', 'autre', 'pehd', 'pet_fonce', 'environnement'], 'list_hashtags_csv': 'background,papier,carton,metal,pet_clair,autre,pehd,pet_fonce,environnement', 'svm_portfolios_learning': '0,0,0,0,0,0,0,0,0', 'photo_hashtag_type': 3594, 'svm_hashtag_type_desc': 5275, 'photo_desc_type': 5275, 'pb_hashtag_id_or_classifier': 0} list_class_names : ['background', 'papier', 'carton', 'metal', 'pet_clair', 'autre', 'pehd', 'pet_fonce', 'environnement'] Configurations: BACKBONE resnet101 BACKBONE_SHAPES [[160 160] [ 80 80] [ 40 40] [ 20 20] [ 10 10]] BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.3 DETECTION_NMS_THRESHOLD 0.3 GPU_COUNT 1 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 640 IMAGE_MIN_DIM 640 IMAGE_PADDING True IMAGE_SHAPE [640 640 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (56, 56) NAME learn_RUBBIA_REFUS_AMIENS_23 NUM_CLASSES 9 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (16, 32, 64, 128, 256) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 1000 TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 50 WEIGHT_DECAY 0.0001 model_param file didn't exist model_name : learn_RUBBIA_REFUS_AMIENS_23 model_type : mask_rcnn list file need : ['mask_model.h5'] file exist in s3 : ['mask_model.h5'] file manque in s3 : [] 2025-04-08 13:00:48.717631: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2025-04-08 13:00:48.722672: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2025-04-08 13:00:48.725954: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2025-04-08 13:00:48.737361: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2025-04-08 13:00:48.745503: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:48.747661: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:49.579687: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:49.582050: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:50.487374: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:50.489411: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:51.470765: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:51.473842: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:52.877968: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:52.880489: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:54.815926: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:54.818060: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:56.077416: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:56.079531: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR local folder : /data/models_weight/learn_RUBBIA_REFUS_AMIENS_23 /data/models_weight/learn_RUBBIA_REFUS_AMIENS_23/mask_model.h5 size_local : 256009536 size in s3 : 256009536 create time local : 2021-08-09 09:43:22 create time in s3 : 2021-08-06 18:54:04 mask_model.h5 already exist and didn't need to update list_images length : 15 NEW PHOTO Processing 1 images image shape: (2160, 3264, 3) min: 0.00000 max: 255.00000 molded_images shape: (1, 640, 640, 3) min: -123.70000 max: 151.10000 image_metas shape: (1, 17) min: 0.00000 max: 3264.00000 error in detect the image : temp/1744110028_1329704_1350595105_092b2dbb95baba16145abbd783380051.jpg 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] [[mrcnn_detection/map/while/LoopCond/_22/_118]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_keras_scratch_graph_13584] Function call stack: keras_scratch_graph -> keras_scratch_graph NEW PHOTO Processing 1 images image shape: (2160, 3264, 3) min: 0.00000 max: 255.00000 molded_images shape: (1, 640, 640, 3) min: -123.70000 max: 151.10000 image_metas shape: (1, 17) min: 0.00000 max: 3264.00000 error in detect the image : temp/1744110028_1329704_1350595100_20e735cca2195940e00682fa2f3df0d7.jpg 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] [[mrcnn_detection/map/while/LoopCond/_22/_118]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_keras_scratch_graph_13584] Function call stack: keras_scratch_graph -> keras_scratch_graph NEW PHOTO Processing 1 images image shape: (2160, 3264, 3) min: 0.00000 max: 255.00000 molded_images shape: (1, 640, 640, 3) min: -123.70000 max: 151.10000 image_metas shape: (1, 17) min: 0.00000 max: 3264.00000 error in detect the image : temp/1744110028_1329704_1350595098_3ddbd4ead9d6eb6efe98ccd8da6c9119.jpg 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] [[mrcnn_detection/map/while/LoopCond/_22/_118]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_keras_scratch_graph_13584] Function call stack: keras_scratch_graph -> keras_scratch_graph NEW PHOTO Processing 1 images image shape: (2160, 3264, 3) min: 0.00000 max: 255.00000 molded_images shape: (1, 640, 640, 3) min: -123.70000 max: 151.10000 image_metas shape: (1, 17) min: 0.00000 max: 3264.00000 error in detect the image : temp/1744110028_1329704_1350595092_f59fedfb4d4b01a2c75b956a84583997.jpg 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] [[mrcnn_detection/map/while/LoopCond/_22/_118]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_keras_scratch_graph_13584] Function call stack: keras_scratch_graph -> keras_scratch_graph NEW PHOTO Processing 1 images image shape: (2160, 3264, 3) min: 0.00000 max: 255.00000 molded_images shape: (1, 640, 640, 3) min: -123.70000 max: 151.10000 image_metas shape: (1, 17) min: 0.00000 max: 3264.00000 error in detect the image : temp/1744110028_1329704_1350595000_87b8b89c12eb7f5c6294c1ef0ed4b618.jpg 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] [[mrcnn_detection/map/while/LoopCond/_22/_118]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_keras_scratch_graph_13584] Function call stack: keras_scratch_graph -> keras_scratch_graph NEW PHOTO Processing 1 images image shape: (2160, 3264, 3) min: 0.00000 max: 255.00000 molded_images shape: (1, 640, 640, 3) min: -123.70000 max: 151.10000 image_metas shape: (1, 17) min: 0.00000 max: 3264.00000 error in detect the image : temp/1744110028_1329704_1350594997_d37e34b05205e7c13471f9c52daefc16.jpg 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] [[mrcnn_detection/map/while/LoopCond/_22/_118]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node conv1/convolution (defined at /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3007) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_keras_scratch_graph_13584] Function call stack: keras_scratch_graph -> keras_scratch_graph NEW PHOTO Processing 1 images image shape: (2160, 3264, 3) min: 0.00000 max: 255.00000 molded_images shape: (1, 640, 640, 3) min: -123.70000 max: 151.10000 image_metas shape: (1, 17) min: 0.00000 max: 3264.00000 error in detect the image : temp/1744110028_1329704_1350594994_fdff88ef7f3abc016e8dca3dc6361c11.jpg 2025-04-08 13:00:57.413337: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:57.416321: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:58.469780: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:00:58.472598: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2025-04-08 13:01:02.165727: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.166242: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 3.60G (3865470464 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.166700: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 3.24G (3478923264 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.167199: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.92G (3131030784 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.167690: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.62G (2817927680 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.168178: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.36G (2536134912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.168790: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.12G (2282521344 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.168823: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.169557: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.169580: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.177032: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.177060: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.177746: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.177768: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.184239: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.184259: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 466.56MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.184735: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.184750: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 466.56MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.215274: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.215308: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.215814: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.215838: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.221544: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.221563: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 243.25MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.222027: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.222041: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 243.25MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2025-04-08 13:01:02.255851: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.256334: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.258114: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.258622: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.302137: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.302615: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.304749: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.305256: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.333989: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.334469: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.335958: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.336466: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.341898: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.342373: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.343972: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.344468: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.350032: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.350540: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.351971: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.352444: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.378531: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.379029: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.379536: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.380045: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.383457: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.383966: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.409954: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.410582: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.411212: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.411833: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.424136: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.424654: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.425140: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.425603: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.429825: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.430296: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.434813: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.435335: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.447591: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.448132: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.452233: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.452752: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.453257: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.453764: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.454522: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.455059: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.465671: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.466188: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.466711: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.467236: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.467743: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.468248: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.468754: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.469257: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.478644: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.479197: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.485400: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.485913: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.536852: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.537353: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.537834: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.538304: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.546064: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.546535: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.562335: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.563019: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.563661: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.564296: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.568446: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.568944: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.569449: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.570007: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.571106: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.571125: W tensorflow/core/kernels/gpu_utils.cc:49] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once. 2025-04-08 13:01:02.581254: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.581738: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.590644: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.591161: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.591646: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.592110: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.592583: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:02.593080: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2025-04-08 13:01:26.184786: E tensorflow/stream_executor/dnn.cc:613] CUDNN_STATUS_EXECUTION_FAILED in tensorflow/stream_executor/cuda/cuda_dnn.cc(3158): 'cudnnConvolutionForward( cudnn.handle(), alpha, input_nd.handle(), input_data.opaque(), filter_nd.handle(), filter_data.opaque(), conv.handle(), ToConvForwardAlgo(algorithm_desc), scratch_memory.opaque(), scratch_memory.size(), beta, output_nd.handle(), output_data.opaque())' 2025-04-08 13:01:27.820563: I tensorflow/stream_executor/stream.cc:1990] [stream=0x1f548050,impl=0x1f546ce0] did not wait for [stream=0x1f547dd0,impl=0x1f546dc0] 2025-04-08 13:01:27.820638: I tensorflow/stream_executor/stream.cc:4938] [stream=0x1f548050,impl=0x1f546ce0] did not memcpy host-to-device; source: 0x37244680 2025-04-08 13:01:27.820746: I tensorflow/stream_executor/stream.cc:1990] [stream=0x1f548050,impl=0x1f546ce0] did not wait for [stream=0x1f547dd0,impl=0x1f546dc0] 2025-04-08 13:01:27.820719: F tensorflow/core/common_runtime/gpu/gpu_util.cc:340] CPU->GPU Memcpy failed 2025-04-08 13:01:27.820771: I tensorflow/stream_executor/stream.cc:4938] [stream=0x1f548050,impl=0x1f546ce0] did not memcpy host-to-device; source: 0x443b2e40 max_time_sub_proc : 3600 Catched exception ! Connect or reconnect ! in case -12 caffe_path_current : About to save ! 2 After save, about to update current ! 30.30user 34.11system 1:00:08elapsed 1%CPU (0avgtext+0avgdata 4864884maxresident)k 649984inputs+26936outputs (10912major+3573017minor)pagefaults 0swaps