
wsl --install/mnt/c/:ln -s $(wslpath $(powershell.exe '[environment]::getfolderpath("MyDocuments")' | tr -d '\r')) ~/Documents
ln -s $(wslpath $(powershell.exe '[environment]::getfolderpath("Desktop")' | tr -d '\r')) ~/Desktop
ln -s $(wslpath $(powershell.exe '[environment]::getfolderpath("UserProfile")' | tr -d '\r'))/Downloads ~/Downloads

command -options argument. For example, ls -l Documents would list the contents of the Documents directory in a detailed long format.pwd — Print current directory pathls — List files and directories; options: -l (detailed), -a (all files including hidden)cd — Change directory; special shortcuts: .. (up one level), ~ (home directory)--help option to display usage information, e.g., ls --helpcd /mnt/c/Users/hp/Downloads
/ (slash). This root is the top-most directory from which all other directories branch.pwd command:pwd
/home/ubuntu/home/ubuntu, indicates your home directory, which is your default location when opening a new terminal. "Ubuntu" in this case is the username./ at the start denotes the root directoryhome is a folder within the root/ characters act as separators between foldersubuntu is the final folder in this specific path/ character has two meanings: it represents the root directory when at the beginning of a path, and it acts as a separator within a path.
ls (listing) command:ls
Documents Downloads Music Public
Desktop Movies Pictures Templates
cd ("change directory") command changes your current working directory. You can specify a directory using an absolute path (starting from the root /), or a relative path (relative to your current directory).cd /home/JNLab_Repo/hands-on/1a/Introduction to the Unix Shell/data-shell/pwd. To move up one directory (to the parent directory), use ..:cd ..~ (tilde) at the start of a path as your user's home directory (e.g., /home/ubuntu), making it a quick shortcut to return home.
/home/ubuntu/Desktop/data-shell and type:ls molls molecules/
mkdir my_project
cd my_project
touch report.txt
cp report.txt final_report.txt
mv final_report.txt /home/user/documents/
rm report.txt
ls to see what it contains:cd /home/JNLab_Repo/hands-on/1a/Introduction to the Unix Shell/data-shell
ls
README.txt coronavirus molecules sequencingthesis_notes using the mkdir ("make directory") command:mkdir thesis_notesls:lsREADME.txt coronavirus molecules sequencing thesis_notes things.txt
mkdir command is your primary tool for creating organized directory structures. You can also create nested directories using the -p flag: mkdir -p parent/child/grandchild
README.txt, which indicates this is a plain text file.
things.txt, which contains a note of books to read for our thesis. Let's move this file to the thesis_notes directory we created earlier, using the mv ("move") command:mv things.txt thesis_notes/
mv what we're "moving", while the second is where it's to go. In this case, we're moving things.txt to thesis_notes/. We can check the file has moved there:ls thesis_notes
things.txtmv command to change a file's name. Here's how we would do it:mv thesis_notes/things.txt thesis_notes/books.txtmv will silently overwrite any existing file with the same name, which could lead to data loss.mv also works with directories, and you can use it to move or rename an entire directory just as you use it to move an individual file.
rm ("remove"). For example, let's remove one of the files we copied earlier:rm backup/cubane.pdb
ls backup/.rm backup
rm: cannot remove `backup': Is a directoryrm by default only works on files, not directories.rm command can remove a directory and all its contents if we use the recursive option -r, and it will do so without any confirmation prompts:rm -r backuprm -r should be used with great caution (you might consider adding the interactive option rm -r -i).rmdir command. This is a safer option than rm -r, because it will never delete the directory if it contains files, giving us a chance to check whether we really want to delete all its contents.
*, which is used to match zero or more characters.pentane.pdb and propane.pdb, because the 'p' at the front only matches filenames that begin with the letter 'p'?, which matches any character exactly once. For example:?ethane.pdb would only match methane.pdb (whereas *ethane.pdb matches both ethane.pdb and methane.pdb)???ane.pdb matches three characters followed by ane.pdb, giving cubane.pdb ethane.pdb octane.pdbls *.pdf in the molecules directory (which does not contain any PDF files) results in an error message that there is no file called *.pdf.
/home/amanda/data, which of the following commands could Amanda use to navigate to her home directory (/home/amanda)?cd .cd /cd /home/amandacd ../..cd ~cd homecd ~/data/..cdcd ..~ stands for the user's home directory, in this case /home/amanda

cat <file> — Display entire file contenthead <file> / tail <file> — Show first/last 10 lines by defaultmore <file> / less <file> — Paginate file content for easier readinggrep <pattern> <file> — Search for text patterns inside filesnano (simple and beginner-friendly) and the more powerful, advanced options like vim and emacs, which are staples for experienced users.cubane.pdb file in the molecules directory.cat command, which stands for "concatenate" (we will see why it's called this way in a little while):cd molecules
cat cubane.pdbhead command:head cubane.pdb
head prints the first 10 lines of the file. We can change this using the -n option, followed by a number, for example:head -n 2 cubane.pdb
COMPND CUBANE
AUTHOR DAVE WOODCOCK 95 12 06tail command:tail -n 2 cubane.pdbTER 17 1
ENDless command:less cubane.pdbless will open the file in a viewer where you can use ↑ and ↓ to move line-by-line or the Page Up and Page Down keys to move page-by-page. You can exit less by pressing Q (for "quit"). This will bring you back to the console.more command, allowing backward navigation through files.
wc (word count) command is a powerful tool for analyzing text files. It can count lines, words, and characters in one or more files.wc *.pdb
20 156 1158 cubane.pdb
12 84 622 ethane.pdb
9 57 422 methane.pdb
30 246 1828 octane.pdb
21 165 1226 pentane.pdb
15 111 825 propane.pdb
107 819 6081 total* wildcard to count lines, words, and characters (in that order, left-to-right) of all our PDB files. The output shows three columns: lines, words, and characters for each file, with a total at the bottom.
wc has options for all of them:wc -l *.pdb20 cubane.pdb
12 ethane.pdb
9 methane.pdb
30 octane.pdb
21 pentane.pdb
15 propane.pdb
107 total-l option is particularly useful when working with data files where each line represents a record or observation.
cat command stands for "concatenate". This is because this command can be used to concatenate (combine) several files together. For example, if we wanted to combine all PDB files into one:cat *.pdb
> operator.wc -l *.pdb > number_lines.txt
ls.> operator will create a new file or overwrite an existing file. If you want to append to an existing file instead, use >>.

chmod: Modify read, write, and execute permissions for files and directories, controlling who can access and modify your datachown: Assign new owners and groups to files, controlling access at the user and group levelrwxr-xr--: Explained as permissions for owner, group, and others. The first three characters represent owner permissions, the next three represent group permissions, and the last three represent permissions for all other userschmod 755 script.sh sets the script to be executable by all users, with the owner having full read, write, and execute permissions (7), while the group and others have read and execute permissions only (5).
cat cubane.pdb | grep "ATOM" | wc -lcat cubane.pdb reads the file contentsgrep "ATOM" filters for lines containing "ATOM"wc -l counts the number of matching lines
nano count_atoms.shcount_atoms.sh if it doesn't already exist, or opening it for editing if it does.
#!/bin/bash
cat cubane.pdb | grep "ATOM" | wc -l#!/bin/bash, is known as the shebang. It tells the operating system which interpreter to use for executing the script. In this case, it specifies that the script should be run using bash, the Bourne-Again SHell.
chmod command to add execute permissions:chmod +x count_atoms.sh./count_atoms.sh./ prefix tells the shell to look for the script in the current directory+x option in chmod +x adds the execute permission for the owner, group, and others. This is a quick way to make a script runnable without specifying detailed permission codes.
cat cubane.pdb | grep "ATOM" | wc -l
cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, pentane.pdb, propane.pdb. But that would be tedious and error-prone. Instead, we can use a loop to automate this process!
for thing in list_of_things
do
# Indentation within the loop is not required, but aids legibility
operation_using ${thing}
done
for — Starts the loopthing — Variable name (you choose this)in list_of_things — Items to iterate overdo — Begins the command block${thing} — Accesses the current itemdone — Ends the loopcount_loop.sh, where we apply this idea:#!/bin/bash
for filename in cubane.pdb ethane.pdb methane.pdb
do
# count the number of lines containing the word "ATOM"
cat ${filename} | grep "ATOM" | wc -l
donefilename takes on the next value in the list, and the command inside the loop is executed with that value.

bash count_loop.sh), we would get the following output:16
8
5filename). Then, the commands inside the loop are executed, before moving on to the next item in the list. Inside the loop, we call for the variable's value using $filename or ${filename}.filename = cubane.pdbfilename = ethane.pdbfilename = methane.pdb$filename stored a different value, cycling through cubane.pdb, ethane.pdb, and finally methane.pdb.for filename in *.pdb to process all PDB files in the directory automatically.

; — Sequential execution (run regardless of success)&& — Conditional success (run next only if previous succeeds)|| — Conditional failure (run next only if previous fails)| to pass the output of one command as input to another. For example, ls -l | grep ".txt" lists files and filters for text files.$1, $@ for script inputs, allowing your scripts to accept arguments and become more flexible* to match multiple characters in file names (e.g., rm *.log to delete all log files).sh files and run them with bash script.sh for reproducible workflows

file command inspects files using magic numbers and content analysis to identify their type, regardless of the filename or extension.$ file mydocument.txt
mydocument.txt: ASCII text
$ file myprogram
myprogram: ELF 64-bit LSB executable, x86-64
$ file archive.tar.gz
archive.tar.gz: gzip compressed data, from Unixgzip / gunzip for .gz files; tar for archiving multiple files together
> (greater-than) symbol followed by the description or identifier of the sequence. Following the initial line (used for a unique description of the sequence) is the actual sequence itself in standard one-letter code.>KX580312.1 Homo sapiens truncated breast cancer 1 (BRCA1) gene, exon 15 and partial cds
GTCATCCCCTTCTAAATGCCCATCATTAGATGATAGGTGGTACATGCACAGTTGCTCTGGGAGTCTTCAGAATAGAAACTACCCATCTCAAGAGGAGCTCATTAAGGTTGTTGATGTGGAGGAGTAACAGCTGGAAGAGTCTGGGCCACACGATTTGACGGAAACATCTTACTTGCCAAGGCAAGATCTAG
>KRN06561.1 heat shock [Lactobacillus sucicola DSM 21376 = JCM 15457]
MSLVMANELTNRFNNWMKQDDFFGNLGRSFFDLDNSVNRALKTDVKETDKAYEVRIDVPGIDKKDITVDYHDGVLSVNAKRDSFNDESDSEGNVIASERSYGRFARQYSLPNVDESGIKAKCEDGVLKLTLPKLAEEKINGNHIEIE
>. This makes FASTA files ideal for storing entire datasets, such as all proteins in an organism or all genes in a genome.>KRN06561.1 heat shock [Lactobacillus sucicola DSM 21376 = JCM 15457]
MSLVMANELTNRFNNWMKQDDFFGNLGRSFFDLDNSVNRALKTDVKETDKAYEVRIDVPGIDKKDITVDYHDGVLSVNAKRDSFNDESDSEGNVIASERSYGRFARQYSLPNVDESGIKAKCEDGVLKLTLPKLAEEKINGNHIEIE
>3HHU_A Chain A, Human Heat-Shock Protein 90 (Hsp90)
MPEETQTQDQPMEEEEVETFAFQAEIAQLMSLIINTFYSNKEIFLRELISNSSDALDKIRYESLTDPSKLDSGKELHINLIPNKQDRTLTIVDTGIGMTKADLINNLGTIAKSGTKAFMEALQAGADISMIGQFGVGFYSAYLVAEKVTVITKHNDDEQYAWESSAGGSFTVRTDTGEPMGRGTKVILHLKEDQTEYLEERRIKEIVKKHSQFIGYPITLFVEK
>, followed by the sequence data on subsequent lines. The header typically contains an identifier and description. Sequences can span multiple lines for readability.
@ character and is followed by a sequence identifier and an optional description (like a FASTA title line)+ character and is optionally followed by the same sequence identifier again@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*(((**+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65







## string to include metadata, followed by variant records in tab-delimited format.##description: evidence-based annotation of the human genome (GRCh38), version 25 (Ensembl 85)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2016-07-15
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"




cd ~/Desktop/data-shellsequencing directory named backup.
-r with the cp command (-r means "recursive").-r option, this is what happens:cp sequencing backup
cp: -r not specified; omitting directory 'sequencing'-r.cp -r sequencing backupls we can see a new folder called backup:ls
README.txt backup books_copy.txt coronavirus molecules sequencing thesis_notes-r (recursive) flag is essential when working with directories. It tells cp to copy the directory and all of its contents, including subdirectories.
cd ~/Desktop/data-shellcp do when given several filenames and a directory name?mkdir -p backup
cp molecules/cubane.pdb molecules/ethane.pdb backupcp do when given three or more file names?cp molecules/cubane.pdb molecules/ethane.pdb molecules/methane.pdb/
cp copies the files to the named directory. This is the standard way to copy multiple files at once.cp throws an error such as the one below, because it is expecting a directory name as the last argument:cp: target 'molecules/methane.pdb' is not a directorycp interprets the last argument as the destination, and in this case, it's a file, not a directory.cp with multiple source files, the last argument must be a directory where all the files will be copied.
ls command(s) will produce this output?ethane.pdb methane.pdbls *t*ane.pdbls *t?ne.*ls *t??ne.pdbls ethane.** wildcard matches zero or more characters, while ? matches exactly one character.
*) followed by the letter t, then zero or more characters (*) followed by ane.pdb. This gives ethane.pdb methane.pdb octane.pdb pentane.pdb.*) followed by the letter t, then a single character (?), then ne. followed by zero or more characters (*). This will give us octane.pdb and pentane.pdb but doesn't match anything which ends in thane.pdb.??) between t and ne. This correctly matches ethane.pdb and methane.pdb.ethane.pdb, missing methane.pdb.
sequencing and complete the following tasks:run1/ directory. Save the output in a file called sequencing_files.txt.ls run2 > sequencing_files.txt?>> can be used to append the output of a command to an existing file. Re-run both of the previous commands, but instead use the >> operator the second time. What happens now?> (overwrite) and >> (append)!
ls, followed by > to save the output in a file:ls run1 > sequencing_files.txtcat sequencing_files.txt
sampleA_1.fq.gz
sampleA_2.fq.gz
sampleB_1.fq.gz
sampleB_2.fq.gz
sampleC_1.fq.gz
sampleC_2.fq.gz
sampleD_1.fq.gz
sampleD_2.fq.gz> creates a new file and writes the command output to it.
ls run2/ > sequencing_files.txt, we will replace the content of the file:cat sequencing_files.txt
sampleE_1.fq.gz
sampleE_2.fq.gz
sampleF_1.fq.gz
sampleF_2.fq.gz> operator overwrites the existing file content. All previous data is lost. This is why it's crucial to understand the difference between > and >>!
>> operator the second time we run the command, we will append the output to the file instead of replacing it:ls run1/ > sequencing_files.txt
ls run2/ >> sequencing_files.txt
cat sequencing_files.txt
sampleA_1.fq.gz
sampleA_2.fq.gz
sampleB_1.fq.gz
sampleB_2.fq.gz
sampleC_1.fq.gz
sampleC_2.fq.gz
sampleD_1.fq.gz
sampleD_2.fq.gz
sampleE_1.fq.gz
sampleE_2.fq.gz
sampleF_1.fq.gz
sampleF_2.fq.gz>> operator preserved the original content and added the new content at the end.
coronavirus/variants/, there are several CSV files with information about SARS-CoV-2 virus samples that were classified according to clades (these are also commonly known as coronavirus variants).all_countries.csvalpha.csv that contains only the Alpha variant samples
cat to combine all the files into a single file:cat *_variants.csv > all_countries.csv*_variants.csv matches all CSV files ending with "_variants.csv", and the > operator saves the combined output to a new file.grep to find a pattern in our text file and use > to save the output in a new file:grep "Alpha" all_countries.csv > alpha.csvless alpha.csv.wc to count the lines of the newly created file:wc -l alpha.csv

