Google Cloud Cannot Upload File (>25 Mb)

How to Upload large files to Google Colab and remote Jupyter notebooks

by Bharath Raj

How to Upload big files to Google Colab and remote Jupyter notebooks

NHHPfQ0REk3MQuQ2gr9njfUthdZLqLnkIDbc
Photo by Thomas Kelley on Unsplash

If you oasis't heard about information technology, Google Colab is a platform that is widely used for testing out ML prototypes on its free K80 GPU. If you take heard almost it, chances are that you gave it shot. Simply you might have become exasperated because of the complexity involved in transferring large datasets.

This web log compiles some of the methods that I've found useful for uploading and downloading large files from your local system to Google Colab. I've too included boosted methods that tin useful for transferring smaller files with less endeavour. Some of the methods can be extended to other remote Jupyter notebook services, similar Paperspace Gradient.

Transferring Large Files

The most efficient method to transfer large files is to use a cloud storage system such as Dropbox or Google Bulldoze.

1. Dropbox

Dropbox offers upto 2GB gratis storage infinite per account. This sets an upper limit on the amount of data that you tin transfer at whatsoever moment. Transferring via Dropbox is relatively easier. Y'all can too follow the same steps for other notebook services, such every bit Paperspace Gradient.

Step 1: Archive and Upload

Uploading a big number of images (or files) individually will accept a very long time, since Dropbox (or Google Drive) has to individually assign IDs and attributes to every image. Therefore, I recommend that y'all annal your dataset kickoff.

One possible method of archiving is to convert the folder containing your dataset into a '.tar' file. The lawmaking snippet beneath shows how to convert a binder named "Dataset" in the home directory to a "dataset.tar" file, from your Linux last.

                tar -cvf dataset.tar ~/Dataset              

Alternatively, you could use WinRar or 7zip, whatever is more convenient for you lot. Upload the archived dataset to Dropbox.

Step ii: Clone the Repository

Open Google Colab and showtime a new notebook.

Clone this GitHub repository. I've modified the original code so that it can add together the Dropbox access token from the notebook. Execute the following commands one past i.

                !git clone https://github.com/thatbrguy/Dropbox-Uploader.git cd Dropbox-Uploader !chmod +10 dropbox_uploader.sh              

Step 3: Create an Admission Token

Execute the following control to encounter the initial setup instructions.

                !fustigate dropbox_uploader.sh              

It will display instructions on how to obtain the access token, and will ask yous to execute the post-obit control. Supersede the bold letters with your access token, then execute:

                !repeat "INPUT_YOUR_ACCESS_TOKEN_HERE" > token.txt              

Execute !fustigate dropbox_uploader.sh again to link your Dropbox business relationship to Google Colab. Now y'all can download and upload files from the notebook.

Step 4: Transfer Contents

Download to Colab from Dropbox:

Execute the following command. The argument is the name of the file on Dropbox.

                !bash dropbox_uploader.sh download YOUR_FILE.tar              

Upload to Dropbox from Colab:

Execute the following command. The kickoff argument (result_on_colab.txt) is the name of the file you want to upload. The 2d argument (dropbox.txt) is the proper noun you desire to salve the file as on Dropbox.

                !bash dropbox_uploader.sh upload result_on_colab.txt dropbox.txt              

ii. Google Drive

Google Drive offers upto 15GB free storage for every Google account. This sets an upper limit on the amount of data that you can transfer at any moment. You can e'er expand this limit to larger amounts. Colab simplifies the authentication process for Google Drive.

That being said, I've also included the necessary modifications you tin can perform, then that you tin access Google Drive from other Python notebook services likewise.

Step one: Archive and Upload

Just as with Dropbox, uploading a big number of images (or files) individually will take a very long time, since Google Drive has to individually assign IDs and attributes to every prototype. So I recommend that you lot annal your dataset commencement.

One possible method of archiving is to convert the folder containing your dataset into a '.tar' file. The code snippet below shows how to catechumen a binder named "Dataset" in the home directory to a "dataset.tar" file, from your Linux terminal.

                tar -cvf dataset.tar ~/Dataset              

And again, you can use WinRar or 7zip if yous prefer. Upload the archived dataset to Google Drive.

Stride 2: Install dependencies

Open Google Colab and starting time a new notebook. Install PyDrive using the following command:

                !pip install PyDrive              

Import the necessary libraries and methods (The bold imports are only required for Google Colab. Exercise not import them if yous're not using Colab).

                import os from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive from google.colab import auth from oauth2client.client import GoogleCredentials              

Footstep iii: Authorize Google SDK

For Google Colab:

Now, you have to authorize Google SDK to access Google Drive from Colab. First, execute the post-obit commands:

                auth.authenticate_user() gauth = GoogleAuth() gauth.credentials = GoogleCredentials.get_application_default() bulldoze = GoogleDrive(gauth)              

Yous volition get a prompt as shown below. Follow the link to obtain the key. Copy and paste information technology in the input box and press enter.

hzfWkut06QN9A99io686hCuHLETczZemgM9Y
Prompt to authenticate user

For other Jupyter notebook services (Ex: Paperspace Gradient):

Some of the following steps are obtained from PyDrive'southward quickstart guide.

Go to APIs Panel and make your own project. And so, search for 'Google Drive API', select the entry, and click 'Enable'. Select 'Credentials' from the left bill of fare, click 'Create Credentials', select 'OAuth client ID'. You should see a menu such as the paradigm shown below:

jP9acVWbTFaXXFctxTPZNSuw2EP2xqZZVg0R

Fix "Application Blazon" to "Other". Requite an appropriate name and click "Save".

Download the OAuth ii.0 client ID you only created. Rename it to client_secrets.json

Upload this JSON file to your notebook. You lot tin can do this by clicking the "Upload" button from the homepage of the notebook (Shown Below). (Note: Practise not use this button to upload your dataset, equally it volition be extremely time consuming.)

dOEeYDLPP98VFZDSNqXTYxkzc9B6o8fsRdTn
Upload button shown in red

At present, execute the following commands:

                gauth = GoogleAuth() gauth.CommandLineAuth() drive = GoogleDrive(gauth)              

The remainder of the procedure is similar to that of Google Colab.

Step four: Obtain your File's ID

Enable link sharing for the file you want to transfer. Copy the link. Yous may get a link such as this:

                https://bulldoze.google.com/open up?id=YOUR_FILE_ID              

Copy only the assuming part of the in a higher place link.

Step v: Transfer contents

Download to Colab from Google Drive:

Execute the following commands. Here, YOUR_FILE_ID is obtained in the previous step, and DOWNLOAD.tar is the name (or path) you want to salvage the file as.

                download = drive.CreateFile({'id': 'YOUR_FILE_ID'}) download.GetContentFile('DOWNLOAD.tar')              

Upload to Google Drive from Colab:

Execute the following commands. Here, FILE_ON_COLAB.txt is the proper noun (or path) of the file on Colab, and DRIVE.txt is the proper noun (or path) you want to save the file as (On Google Drive).

                upload = bulldoze.CreateFile({'title': 'Bulldoze.txt'}) upload.SetContentFile('FILE_ON_COLAB.txt') upload.Upload()              

Transferring Smaller Files

Occasionally, you may want to laissez passer just one csv file and don't want to go through this unabridged hassle. No worries — there are much simpler methods for that.

1. Google Colab files module

Google Colab has its inbuilt files module, with which you can upload or download files. You can import it by executing the following:

                from google.colab import files              

To Upload:

Apply the post-obit control to upload files to Google Colab:

                files.upload()              

You will exist presented with a GUI with which you lot tin can select the files yous want to upload. Information technology is not recommended to use this method for files of large sizes. It is very dull.

To Download:

Utilize the following command to download a file from Google Colab:

                files.download('example.txt')              

This feature works all-time in Google Chrome. In my feel, it just worked one time on Firefox, out of virtually 10 tries.

two. GitHub

This is a "hack-ish" way to transfer files. You can create a GitHub repository with the small files that y'all want to transfer.

Once you create the repository, you can but clone it in Google Colab. Yous can so push your changes to the remote repository and pull the updates onto your local system.

Only exercise note that GitHub has a hard limit of 25MB per file, and a soft limit of 1GB per repository.

Thank you for reading this article! Leave some claps if you it interesting! If you take any questions, y'all could hit me up on social media or send me an e-mail (bharathrajn98[at]gmail[dot]com).

Learn to code for free. freeCodeCamp's open up source curriculum has helped more than forty,000 people get jobs as developers. Get started

blountsieneat1998.blogspot.com

Source: https://www.freecodecamp.org/news/how-to-transfer-large-files-to-google-colab-and-remote-jupyter-notebooks-26ca252892fa/

0 Response to "Google Cloud Cannot Upload File (>25 Mb)"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel